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PREFACE 


There are two quite distinct aspects or levels of mathematical 
statistics. The one involves elementary mathematics and the 
methodologies serve descriptive purposes. These fundamentals are 
set forth in Part I. The other aspect is essentially mathematical in 
character and the methodologies are developed for inferential pur¬ 
poses. It cannot be made elementary by its very nature because 
the problems are so diflHicult that powerful mathematical tools are 
necessary to provide solutions of the problems. 

In recent years great advances have been made in statistical 
theory. Methods of formulating and testing hypotheses have been 
systematically developed and a sound basis for statistical inference 
has replaced older methods involving the intuitive notions of prob¬ 
able error. In this book I have elected to include some of the 
classical theory and some of the simpler concepts and techniques of 
the modern theory. In short, I have made a sustained effort to write 
an up-to-date text which will serve to prepare the student for the 
really mathematical part of the theory of statistics. A knowledge 
of elementary probability theory, calculus, and determinants is pre¬ 
supposed. It is also understood that the student is familiar with the 
rudiments of statistics such as are given in Part I. However, if no 
preliminary course in statistics has been studied, mature students 
should be able to acquire the essential definitions and concepts in a 
rapid survey of Part I. 

Of the books which have been particularly useful in preparing the 
manuscript, I would name the following: Camp’s Mathematical 
Statistics, Fisher’s Statistical Methods For Research Workers, Fry’s 
Probability And Its Engineering Uses, Rietz’s Mathematical Statistics, 
and Wilks’ Statistical Inference, I have also derived much help from 
certain papers in the literature by Professors Carver, Jackson, Rider, 
and Rietz. Specific reference to these papers is made in the text. 
A reference list of pertinent books and papers is given at the end of 
each of the last three chapters. It is recommended that some of 
these be available to the student for supplementary study in connec¬ 
tion with this text. 
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CHAPTER I 

PROBABILITY AND ITS RELATION TO STATISTICAL THEORY. THE 
BERNOULLI DISTRIBUTION. APPROXIMATIONS BY MEANS OP THE 
NORMAL CURVE AND POISSON EXPONENTIAL FUNCTION 

1. Importance. The subject of probability deals with one of the 
most interesting branches of modern mathematics and is becoming 
conspicuous for its applications in many fields of learning. This sub¬ 
ject is of fundamental importance, not only in the theory of insur¬ 
ance and statistics, but also in various branches of the biological and 
physical sciences. The following quotations from contemporary 
writers indicate the importance of probability theory in the philosophy 
of modern science. 

f 

It was, I think, Huxley who said that six monkeys, set to strum unintelligently 
on typewriters for millions of millions of years, would be bound in time to write 
all the books in the British Museum. If we examined the last page which a 
particular monkey had typed, and found that it had chanced, in its blind strum¬ 
ming, to type a Shakespeare sonnet, we should rightly regard the occurrence as a 
remarkable accident, but if we looked through all the millions of pages the mon¬ 
keys had turned off in untold millions of years, we might be sure of finding a 
Shakespeare sonnet somewhere amongst them, the product of the blind play of 
chance. ... 

These and other considerations have led many physicists to suppose that there 
is no determinism in events in which atoms and electrons are involved singly, and 
that the apparent determinism in large-scale events is only of a statistical nature. 
When we are dealing with atoms and electrons in crowds, the mathematical law 
of averages imposes the determinism which physical laws fail to provide. . . . 
We can only speak in terms of probabilities. 

— The Mysterious Universe^ Sir James Jeans. 

In order to understand the nature of knowledge about social and economic life, 
it is necessary to know something about the theory of probability; because 
knowledge in these fields, in general, is essentiall^jmdeterminate knowledge. 
There are two fundamental ideas which need to be gra^d in ofder to imderstand 
the social sciences. The first idea is that all science is philosophical. ... The 

1 
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time honored aim of philosophy has been to discover and interpret (to the extent 
possible to the human mind) the characteristics of nature. By nature is meant 
all things, material and psychic, external to man, and man himself. In many 
fields the minds of men have penetrated into the mysteries of nature and have 
produced knowledge concerning them. In the physical aspects (both external 
to man and in man) great progress has been made towards the attainment of 
apparently precise knowledge, within certain definite limits; while in the field 
of the psychic the progress has been towards increasing the probabilities of truth 
of a great variety of hypotheses. But it is characteristic of the psychic aspects 
of knowledge that the facts in those fields are indeterminate, not precise, and 
apparently dynamic. Even in the physical and chemical world, the discoveries 
of recent years have emphasized a great realm of indeterminacy, particularly 
when confronting great velocities and infinitely small particles within the atom. 
Thus the second idea to grasp is that in all fields of knowledge, even the physical, 
beyond the limited range of relatively precise knowledge accumulated by man, 
there is a vast frontier of speculation. It has been the function of scientific 
method — the new tool of philosophy — to penetrate ever deeper into this realm 
of speculative knowledge. Primarily this has been made possible by the develop¬ 
ment of the theory of probabilities. 

— Elementary Statistics, James G. Smith. 

There exist in nature systems of chance causes which operate in such a way 
that the effects of these causes can be predicted — by making use of customary 
probability theory in which objective probabilities in the limiting statistical 
sense are substituted for the mathematical probabilities. 

— Economic Control of Quality of Manufactured Product, W. A. Shewhart. 

It appears likely that the further development of the theory of probability in 
the next few decades may turn out to be a major chapter in the history of science. 

— Science, January 18, 1929. 

The great extension in the use of statistics in the last two decades has been 
associated with and largely made possible by mathematical developments based 
upon the theory of probability. 

— Harold Hotelling, Journal American Statistical Association, March 

Supplement, 1931 

2. Definitions. Inasmuch as the subject of probability plays an 
important role in certain phases of statistical theory, we will now 
consider some of the fundamental principles of this subject. It will be 
convenient to divide the subject into two classes, and speak of a 
'priori and empirical probability. 

(o) A priori prohability. If all the ways of obtaining successes and 
failures can be analyzed into s possible mutually exclusive ways each 
of which is equally likely, and if x of these ways give successes, the 
probability of success in a single trial is x/s. 

.Apriori probability is concerned with that class of problems in 
whiph a full knowledge of the conditions affecting the event in ques- 
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tion is known beforehand. In other words, the problem may be set up 
and solved abstractly. Thus the following problems are questions of 
a priori probability: A box contains 4 white and 6 red billiard balls. 
What is the probability that two drawn will be of the same color? 
A coin is to be tossed 7 times. What is the probability that heads 
will turn up at least 3 times? A sample of telephone receivers is 
to be taken from a case containing 100 telephone receivers of which 
20 are known to be defective. What is the probability that the sam¬ 
ple will contain exactly 2 defectives? 

There is another class of events in which it is impossible or im¬ 
practical to enumerate all of the equally likely ways in which the 
event in question may succeed or fail. When this is the case it is 
necessary to estimate the probability by trial and observation. Thus 
we have 

(6) Empirical probability. If it is observed that an event has 
occurred x times among s trials the ratio x/s is called the relative 
frequency of success. The limit* of the ratio a:/s as s is taken 
indefinitely large is called the probability of success in a single trial. 
In symbols we have 

X 

lim - — p. 

8 —^ 00 S 

In statistical applications the limit of x/s cannot in general be deter¬ 
mined, but an observed relative frequency (s large) often provides 
a valuable estimate of the underlying probability assumed in the 
definition. For example, according to the American Experience 

*The student familiar with the theory of limits will realize that a rigorous 
proof that a probability p exists as the limit of x/s as s increases would require 
us to show that, Given an c > 0, then there exists a number N such that 

X 

-p < € for all s > iV. 

8 

It is of course obvious that we cannot prove the existence of this limit because 
we cannot be sure that the difference \x/8 — p| will become and remain, as 8 
increases, less than any assigned positive number, no matter how small. For 
example, after throwing a coin 10,000 times it is possible to get a run of all heads 
in the next 1000 throws. In this connection Hietz says: That the limit exists 
is an empirical assumption whose validity cannot be proved, but experience with 
data in many fields has given much support to the . . . usefulness of the assump¬ 
tion.^^ (Maihematical StatisticSf p. 8.) 

We can, however, prove that the probability approaches certainty that x/a 
will approach p as a limit as a is indefmitely increased. (See § 7.) 
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Mortality Table, out of 67,917 persons living aged 60, there are 1646 
who die during the following year. Therefore, the relative frequency 
1646/67,917 = .026693 is taken by insurance companies as the 
probability that a person aged 60 will not survive another year. 

3. Theorems. We will now review from algebra certain elemen¬ 
tary formulas and theorems leading to the use of probability theory 
in statistical problems. We will begin with the subject of permuta¬ 
tions and combinations. 

A permvMion is an arrangement of all or part of a set of things. 
A combination is a group of all or part of a set of things. A different 
permutation may be obtained by changing either the items or their 
order but a different combination may be obtained only by changing 
one or more of the items in the group. 

Theorem I. The number of permutations of n different things taken 
raia time is denoted by the symbol P(n, r) and given by 

P(w, r) = n(n — l)(n — 2) • • • (n — r + 1). 

Corollary. If the n items are not all different, there being ni of 
type T\, n% of type 2^2, • • n* of type Tk, then the number of distinct ^ 
p&gmutaUons of the n items taken n at a time is 

_n_J_ 

ni! n 2 ! • • • n* ! 

* 

where The symbol n f, read factorial n,** is defined by 

n I — n(n — l)(n — 2) • • • 3 • 2 • 1. 


Theorem II. The number of combinations of n different things 
taken rata time is denoted by C(n, r) and given by 


C (n, r) 


n(n — 1) • • • (n — r + 1) _ n ! 

r ! r I (ra — r) ! 


It will be imderstood that C(n, r) equals zero when r > n and equals 
one when r = n. 

Theorem m. The toUd number of combinations of n different things 
taken 1,2, or not a time is 2" — 1. 

Proof. The formula for C(n, r) is the coefficient of the (r + !)«< 
term in the binomial expansion (x + p)". Thus, 


(* + y)" = X* + C(n, l)x"“‘v + 2)x"~*i/* 

+ • • • + C(n, r)x”^y' + • • • + »". 
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If we let a; = 2/ = 1> this becomes 

2- - 1 = C(n, 1) + C(n, 2) + + C(n, r) 

+-h C(n, n). 

Events of a . set are said to be mutually exclusive if the occurrence 
of any one of them on a particular occasion excludes the occurrence 
of any other on that occasion. They are said to be independent or 
dependent according as the occurrence of any one of them does not 
or does affect the occurrence of others in the set. 

If p is the probability that an event will happen in a single trial 
and q is the probability that the event will fail (to happen) in a single 
trial, then p q = 1 and unity is the symbol for certainty. 

Theorem IV. The probability that one or other of a set of mutually 
exclusive events should happen when all of them are in question is the 
sum of the probabilities for the separate events. 

Theorem V. The probability that all of a set of independent events 
will happen on a given occasion when all of them are in question is the 
product of the probabilities for the separate events. 

Theorem VI. Suppose the events are dependent. Let p\ be the 
probability for the happening of a first event E\ and p^ be the probability 
for the occurrence of a second event after E\ has happened. Then the 
probability that both events will happen in the order named is pip^. 
The procedure may be extended in an obvious manner to any finite 
number of events. 

4. Supplementaiy Reading. It is suggested that the student look 
up the proofs of the above theorems in any college algebra text and 
review the discussions presented there. 

For the more advanced student the following references are recom¬ 
mended. Some of the early chapters of the books may also be read 
with profit by the beginning student. 

Books: 

The Mathematical Theory of ProhabUUies — Ame Fisher. 

Probability — Coolidge. 

Mathematical Statistics — Bietz. 

Choice and Chance — Whitworth. 

Probability and Its Engineering Uses — Fry. 

Elements of Probability — Levy and Roth. 

.y Introduction to Mathematical ProbabUUy — Uspensky. 

Tapers: 

Fundamental Concepts in the Theory of ProbabUUy — Fry, American Mathe¬ 
matical Monthly, vol. 41, 1934, p. 207. 

On the Foundations of the Theory of ProbabUUy — Struik, Philosophy of Science, 
vol. 1, no. 1, January, 1934. 
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Problems 

1. Prove both algebraically and verbally that 

(а) P(n, r) = C(n, r)P(r, r), (6) C(w, r) = C(n, n - r). 

2. From among nine men A, B, C, D, Ef F, G, ff, /, a committee of 

four men will be chosen. The nine names will be written on nine 
separate cards and four cards drawn at random one at a time from 
a box. 

(o) In how many different ways may the four cards come out? 
Ana, 3024. 

(б) How many different committees are possible not including the man A? 
An$. 70. 

8. Consider the word “ introduce.*’ 

(o) In how many of the possible arrangements of all its letters will there be 
a consonant in the first place? Ans. 201,600. 

(6) From its letters how many four letter permutations consisting of three 
vowels and one consonant can be formed? Ans. 480. 

(c) If five of its letters are selected at random what is the probability that 
two are vowels and three are consonants? Ans. 10/21. 

4. On a table there are four different biographies with brown backs and seven 
different novels with red backs. 

(a) If all of the books are placed upright in a row on a shelf, in how many 
different ways may they be arranged so that the orders of the colors 
are different? Ans. 330. 

t( 6) In how many different ways may two of the biographies and three of 
the novels be selected and arranged on the shelf so that the orders of the 
books are different? Ans. 25,200. 

6. In a box there are five red billiard balls with the numbers 1, 2, 3, 4, 5, painted 
on them (one on each ball), and three white billiard balls with the numbers 
1,2,3, similarly painted on them. From the box a man draws two balls at 
random. 

(a) What is the probability that one of the balls drawn is white and the other 
is red? Ans. 15/28. 

; (6) What is the probability that the two balls drawn have either the same 
color or the same number? Ans. 4/7. 

6. A bag contains four white, five red, and six black balls. Three are drawn 
at random. Find the probability that (a) no ball drawn is black, (5) 
exactly two are black, (c) all are of the same color. 

'f-l* An um contains four white and five black balls. Three balls are drawn a{ 
random and replaced by green ballsu If then two balls are drawn at ran¬ 
dom, what is the probability that they are both of the same oolorj Ans, 
29/108. 

8. ^^te out the expressions for C(n — 1,2); C(n — 1, 3); C(8, x). 

9« (a) Show that 


(b) What is the value of the above expression when x i? 



7 


Probability and Statistical Theory 

10. Write in expanded form: 

(«) ne'e*, x). 

z-0 

( 6 ) 

a? —1 

11. Twelve cards have been dealt, six down, and the other six showing a jack, 

two kings, a seven, a five, and a four. What is the probability that the 
next card will be a four or less? {National Mathematics MagazinCj vol. 
XIII, no. 2, p. 94.) 

yl2. From an urn containing ten balls, numbered from one to ten, balls are drawn, 
one by one and placed in a row of holewS, numbered from one to ten, each 
ball being placed in the proper hole. What is the probability that there 
will not be an empty hole between two filled ones at any time of the 
drawing? {American Mathematical Monthlyy vol. 45, no. 9, p. 635.) 
Ans, 2/14,175. 

6. Repeated Trials. We now consider a theorem which is vp|: 
important both in the theory of probability and its applications in 
statistics. 

Theorem Vn. Let p be the probability that an event mil happen 
in a single trial, and q — \ — p the probability that the event will fail 
in a single trial. Then the probability P that the event will happen 
exactly x times in s trials, during which p remains constant, is given by 
the (x + l)si term of the binomial expansion: 

(1) (5 + VY = + C{s, + C(s, 2 )pV“^ + • • • 

+ C{s, + • • • + p*. 

Proof. By Theorem V, the probability that the event will happen 
X times and fail the other s — a; times in any specified order is 
number of ways in which the order may be specified 
is C{s, x) or C{s,s — x). These ways are equally likely and mutually 
exclusive. Therefore, by Theorem IV the required probability is 
C{s, a:)p*g*“*. We recognize this expression as the (x + l)s< term 
of the binomial expansion of {q +. p)*. 

Corollary 1. The probability that the event will happen at most 
X times in s trials is the sum of all those terms of (1) in which the ex^ 
ponent of p is equal to or less than x. 

Corollary 2. The probability that the event will happen at least 
X times in s trials is the sum of all those terms of (1) in which the ex¬ 
ponent of p is equal to or greater than x. 

Proofs. By Theorem IV, the probability that the event will happen 
at most X times is the sum of the probabilities that it will happen 
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0,1,2,3, • • •, X times. Similarly, the probability that the event will 
happen at least x times is the sum of the probabilities that it will 
happen s, s — 1, s — 2, • • •, x times. 


Problems 


1. A dust storm contains particles of two kinds identical except as to color, 

brown and yellow particles existing in the ratio 3:2. If five particles of 
this dust enter my eye at random determine the probability that two of 
them are brown and the other three are yellow. (See American Mathe¬ 
matical Monthly^ vol. 41, no. 5, May 1934.) 

2. Six coins are tossed once, or what amounts to the same thing, one coin is 

tossed six times. Find the probability of obtaining heads 

(a) exactly three times 
(5) at most three times 

(c) at least three times 

(d) at least once. 

8. (o) What is the probability of throwing seven in a single toss of two dice? 

(b) In six tosses of two dice find the probability of throwing seven at least once. 
^4, Toes six coins 64 times and record the number of times heads appear 0, 1, 2, 

3, 4, 5, 6 times. (Instead of tosses, the coins may be shaken in a box.) 
Compare the resulting distribution of frequencies with the terms of the 
expansion of 64(i + })*. 

6. A bag contains white and black balls in the proportion 2:3. Let the 
probability of drawing a white ball be called a success. Three balls are 
drawn separately and after each drawing the ball is returned to the bag 
and thoroughly mixed with the others so that the fundamental probability 
of success remains constant during the trials. Find the probabilities of 
0, 1, 2, 3 successes. If this experiment were repeated 125 times what is 
the theoretical frequency of each of the possible number of successes? 

6. Show that equation (1) may be written: 


7. Show that 


(4 + p)* *■ S 


8 ! 


»-0® !(« — »)! 




T - 1 ) ^ 

A (X - 1) I (8 - a:) I 


pflf—lg»—* as 


(7 + 


1 . 


8. (a) Find the values of C(18, ac) for * «= 0 to a; » 18 inclusive. ^ (To the in¬ 
structor: Pascal’s Triangle provides a simple scheme lor constructing a 
table of binomial coefficients.) 

(6) Evaluate 2V3“ for x ■■ 0 to x « 18 inclusive. 

(c) Show that (i + })*• may be written 
18 

£ /(*) where/(*) » C(18, a:)2*/3». 

X -0 

(<0 Using the results of (a) and (6), find the values of f{x) for a; 0 to 
a; 18. Save your results for future reference. 
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6. Relative Frequencies from Dichotomous Samples. Suppose a 
sample of 8 individuals from the same population is divided into two 
groups according as they have a certain attribute or not. Such a divi¬ 
sion is said to be dichotomom. Out of s individuals we find that x 
individuals have the attribute in question and s — a: do not, it being 
possible for x to take any integral value from 0 to s inclusive. The 
attribute in question is frequently called the “ event ” and its occur¬ 
rence is called a “ success.” The ratio x/s is called the relative 
frequency of success. 

Many illustrations of relative’frequency come readily to ininH. 
Out of 100 throws of a coin we may have noted 45 heads. From 
a group of school children, taken at random, we may find 55 
boys. Or again, we might make a certain disease of children the 
basis of a dichotomy. Out of 100 fifth grade school children we 
may find that 27/100 is the relative frequency of the occurrence of 
measles. 

7. Theorem of Bernoulli. The theorem of Bernoulli describes 
the approach of the relative frequency x/s to the underlying con¬ 
stant probability p as s increases. The theorem may be stated as 
follows: 

Theorem Vill. In a set of s trials in which the chance of success in 
each trial is a constant p, the probability P approaches unity that the 
relative frequency x/s will approach p as a limit as s increases indefi¬ 
nitely* 

Observe that this is a weaker statement than saying that p is the 
limit of x/s as the number of trials increases indefinitely. Another 
way of stating the theorem is as follows: The probability Q — 1 — P 
of the difference (x/s — p) being numerically as large as any assigned 
positive number c will approach zero as a limit as s increases indefi¬ 
nitely. 

The theorem is the basis for our definition of empirical probability. 
It is often regarded as a fundamental theorem of mathematical 
statistics because of the common use of x/s (s large) as a close approxi¬ 
mation to the probability p. 

8. Binomial Description of Frequency. The terms oS (q + p)* 
are the theoretical relative frequencies for a dichotomous situation. 
If we take N sets of s trials the theoretical absolute frequencies are 
given by the terms of N(,q + p)* when N is chosen so that these terms 
are integers. It follows that N is merely a proportionality factor. 

* A proof is given in Chapter VI, § 10. 
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Hence we may say that if in a single trial the probability of an event 
occurring is p and the probability of its not occurring is then if 
a sample of s trials is taken, the frequencies with which the event 
occurs 0, 1, 2, 3, • • s times are proportional to the terms of the 
point binomial {q + p)*. This was the first theoretical distribution 
to be established and a discussion of it is given in Ars Conjectandi by 
J. Bernoulli which was published posthumously in 1713. A distribu¬ 
tion of discrete variates with frequencies proportional to the terms of 
(1) is frequently referred to as a Bernoulli distribution. 

In the Cams Monograph on Mathematical Statistics (p. 23) Pro¬ 
fessor Rietz explains the applications and limitations of (1) in practical 
statistics as follows: 

Such a distribution... serves as a norm for the distributions of relative frequen¬ 
cies obtained from some of the simplest sampling operations in applied statistics. 
For example, the geneticist may regard the Bernoulli distribution (1) as the 
theoretical distribution of the relative frequencies x/s of green peas which he 
would obtain among random samples each consisting of a yield of s peas. The 
biologist may regard (1) as the theoretical distribution of the relative frequencies 
of male births in samples of a births. The actuary may regard (1) as the theo¬ 
retical distribution of yearly death rates in samples of s men of equal ages, say 
of age 30, drawn from a carefully described class of men. In this case we specify 
that the samples shall be taken from a carefully described class of men because 
the underlying assumptions involved in (1) do not permit a careless selection of 
data. Thus, it would not be in accord with the assumptions to take some of 
the samples from a group of teachers with a relatively low rate of mortality and 
others from a group of anthracite coal miners with a relatively high rate of 
mortality.... 

The expression ** simple sampling ” is sometimes applied to drawing a random 
sample when the conditions for repetition just described are fulfilled. In other 
words, simple sampling implies that we may assume the imderlying probability 
p of formula (1) remains constant from sample to sample, and that the drawings 
are mutually independent in the sense that the results of drawings do not depend 
in any sign^cant manner on what has happened in previous drawings. 

9. Graphical Representation. A binomial distribution may be 
represented graphically by a histogram. This is accomplished by 
constructing rectangles centered at x == 0,1, 2, • • • , s with heights 
proportional to-the terms of the binomial. The different ** successes'' 
denoted by x'are the variates, and the corresponding terms of the 
binomial are the theoretical relative frequencies. 

Since the values of x constitute a discrete series it might seem more 
logical to represent the relative frequencies by ordinates instead of 
rectangles. However, since the base of each rectangle is unity the 
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number reijresenting its height is also its area, and the representation 
by areas will be useful in our work. In a case like this the frequencies 
are said to be loaded ” on the 
ordinates at the mid-points of the 
class intervals. 

If we are thinking of relative fre¬ 
quencies or probabilities the sum of 
all the rectangles is unity, whereas 
if we are thinking of absolute fre¬ 
quencies the total area of the histo¬ 
gram is N. Thus if six coins are 
tossed 64 times the theoretical ab¬ 
solute frequencies are given by the 
terms of 64(^ + |)®. These are 1, 6, 

15, 20, 15, 6, 1 and their sum is 64. 

10. The Mean and Standard Deviation. 


C 


0 1 
Fig. 1. 


2 3 4 5 6 

Histogram of (J -f D® 

We have shown that 
the terms of + p)* give the expected frequency of success (with 
respect to an attribute or character) in drawing N samples of s items 
in each sample, where p is the probability of a success. We now 
propose to characterize the distribution of expected frequencies by 
finding the usual moments. In this procedure we may consider the 
relative frequencies given by the terms of (g + p)* because the ab¬ 
solute frequencies are proportional to these terms, N being the pro¬ 
portionality factor. It will be convenient to evaluate first the y^s, 
taking the position of the first term as origin. 

By definition 


Vi = 




where x refers to the number of successes and J{x) refers to the 
corresponding probabilities which are of course the theoretical rela¬ 
tive frequencies. Table 1 shows the appropriate frequency table. 
It is obvious that the sum of the second column is unity. To sum 
the third column we factor out sp, obtaining 


sp[^*”^ + (s — l)p5*"’^ + C(s — 1, 2)p2g'*"* 

•+■••• + CCs — 1, X — l)p*-'^(?*-* + • * • + p*”^] 


which may be written sp{q + p)*”^ == sp. Hence, we have that the 
mean number of successes in s trials is x = sp, where p is the probability 
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Tabls 1 


X 

/ 

xf 

0 

r 

0 

1 

8pq*‘~^ 


2 

S(8 — 1) . . 

2 1 

8(8 — l)p*g*“* 

3 


s(s - 1)(8 - 2) . 

3 1 

2 f ^ ^ 


s(s - 1) • • ■ (« - ® + 1) __ 

8(S - 1) • • • (8 - * + 1)_ 

X 

* I P « 

(X - 1) I P « 

s 

p9 

sp* 

Totals 

jycx) = (g + p)* = 1 

J^xf(x) = sp(q + p)*“^ = sp 


of success in any trial. This result is often called the ** mathematical 
expectation '' or the expected value of x. 

Table 1 assists our intuitions but logically it is unnecessary. 
We could have proceeded as follows: 




5 ! 


Zfox ! (s — a;) ! 


pxq9-xx 




'ox ! (s - x) ! 




We observe that the divisor is unity and in the dividend we can 
divide x into x !. So, 




^i{x-l)\{s-x)\ 


p*q*~*. 


Factoring out sp, we have 


VI = spj^ 


( 8 - 1)1 


1 (x — 1) 1 (s — x) ! 

sp(q + p)*"‘ 


p»-lq,-a 


whence we obtain 

( 2 ) 


X = sp. 












Probability and Statistical Theory I3 


We will use this procedure in finding the higher moments. Since 


2/(*) = a:)p*g*-* = 1 

0 

we may omit it from the denominators of the rest of the v*&, 
definition then 


fLxQ X \(s — x) I 


By 


Writing x^ — x (x -- 1) + x, we have 
s ! 


V2 


= 13 : 


« S I 

p*g*-*a;(x - 1) +13—77-r-,P*8*"**. 

Q X \ X) 1 


^x\{s-x)l 
This simplifies into 

s{s - 1)pKq + p)*”* + sp 

so that we obtain 

P 2 = s{s — l)p^ + sp. 


In order to get <r we must know the second moment about sp. 
From the relation = J'a — (viY we easily find that 


M 2 = spq 

whence 

(3) cr = (spqr^ 

Example 1. Find the mean and standard deviation of the binomial (| + |)® 
by means of formulas (2) and (3). Verify your results by the usual procedure 
for computing moments of a frequency distribution. 

Solution. Here p = |, 3 = f, s = 5. By formulas (2) and (3), — 3, 

cr = 1.095. 

Verification. (| + §)» - i [32 + 240 + 720 + 1080 + 810 + 243]. 

5® 

In finding the moments we may omit the proportionality factor 1/5*. 


X 

/ 

u 


0 

32 

-3 

We find = 3125 

1 

240 

-2 

Y.yi = 0 

2 

720 

-1 

£uy => 3750. 

3 

1080 

0 

Hence 

4 

810 

1 

iZ = 0,2 = ro 4* eft = 3, 

5 

243 

2 

Hi = 1.2, (T* « CTm ** ^/l^ 
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11. Skewness and Kurtosis. We shall now derive expressions 
for the third and fourth moments. By definition 


v» =2 


oa: ! (s — a;) ! 


pxqt-sjA 


Writing a^ = x(x — 1) (a: — 2) + 3a? — 2x, we have 
8 I 


Vs =]C 


_oa:! (« — a:) ! 

+ B± 


pV"*a: (a: - l)(a; - 2) 


I 


-2E 


2 ■■ 0 ^ ^ • 
s ! 




qx ! (s — x) ! 


pXqa-Xx 


+ 3[s(s — l)p* + sp] — 2 sp 
= s(s — l)(s — 2)p* + 3s(s — l)p* + sp. 


Similarly, by definition 




8 ! 


'or ! (8 - r) ! 




Writing == x(x — l)(x — 2){x — 3) + 6a;® — 11a;* + 6a; and pro¬ 
ceeding in a way analogous to that for evaluating we obtain 

va = s(5 — l)(s — 2)(s — 3)p^ + 6 v 8 — lli'2 + 6^1. 


Next we desire the moments about the mean, so that we may obtain 
expressions for skewness and kurtosis. From the relations 

iLtg = j'a — 3 i'2»'i + 2i'i® 

M4 = *^4 — ^VzVi + 6v2J'i* — 3vi^ 


we obtain the quite simple results 

A»» == m(3 - p) 

tH = 8P3[1 + 3(8 — 2)pg]. 
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Recalling that olt == iUr/o-'’ we have finally that 




(q - P) 
y/spq 


1 


a4 =- 

m 



3. 


We observe that none of these moments are subject to Sheppard's 
corrections because the assumption that all the values are concen¬ 
trated at the mid-point of an interval is actually true in the case of 
a binomial distribution. This is obvious graphically since each 
frequency is represented by the middle of a rectangle. 

12. A Recursion Formula. The moments ju* of a Bernoulli distri¬ 
bution can be obtained in an elegant manner by means of the recur¬ 
sion formula 


(4) 


MAH-l = 


[ 


&knk-i — 


diik ~\ 

<kl 


We know that Aki = 1 and = 0, so the formula is to be used for 
fc ^ 1. Thus for 


fc = 1, M 2 = pq{siM> - 0) 

= spq- 

k = 2, itt = p 5[0 - (s - 2sg)] 

= spq{2q - 1) 

= spg(g - p). 

k = Z, M 4 = pq[Zs^pq + s — 6sg + 6sg*] 
= spg[l + Zspq — 6pq] 

= spg[l + 3(s - 2)pg]. 


A simple proof of this formula has been given by A. T. Craig in the 
Bulletin of the American Mathematical Society, vol. 40, pp. 262-264. 

To summarize, we have the important characterizing functions of 
a Bernoulli distribution: 


Mean: 

Variance: 

Skewness: 

Eurtosis: 

Excess: 


= spq 

03 = (? - P)/v 
a 4 = l/o* - 6 /s + 3 
04 - 3 = (1 - Zpq)/spq. 
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IS. Mathematical E:q>ectation. If a variable x may assume any 
one of a countable set of mutually exclusive values Xi, xj, • • •, x«, 
in such a way that /(x{), which we take to be single-valued and noi^ 
negative, is the probability that x takes the value x< and such that 

n 

/(*<) “ then X is called a chance variable and /(x) is defined as 

the probability function of the discrete variable x. If the mutually 
exclusive values are 0, 1, 2, 3, • • ■, s, an example of such a law of 
probability is 

fix) = Cis, x)p*g*-*. 

A frequency distribution whose relative frequencies are given in 
accord with this law of probability is styled a Bernoulli distribution, 
as we have already observed. 

Let the discrete variable x be subject to the law of probability 
fix) and let gix) be any function of x. The mathematical expectation 
of gix), denoted by application of the operator E, is then defined to be 

^ix)] ’=‘%gixi)fixi). 

In particular, if gix) = x then 

^ix) = ILxifiXi) 

i-1 

= Vi = S 

is the first moment, per unit frequency, about the origin. More 
generally, if gix) = x*, (fc = 1, 2, • • •)> then 

E(X*) = '^/’fiXi) = VI, 

is the fcth moment about the origin. If fix) — Cis, x)p*g*“* and 
gix) = x*, then 

(6) E(x*) = x)p®g*”* 

h-O 

defines the moments, about the origin, of a Bernoulli distribution. 
In particular for A: = 1, we have 

(6) Eix) = sp. 

If gix) ■» (x — «p)* and/(x) = Cis, x)p®g*“*, then 

a 

E[(x — sp)*] = a:)p*g*“* 
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is the fcth moment about the mean. For fc = 1, we see that E{x — sp) 
= 0, and for A: = 2, we have 

= E{x — sp)2 
= E{x^) — {spY 
which we have seen reduces to 
(7) E{x - spy = spq. 

Equations (6) and (7) give the mean and variance with respect to 
the number of successes x in s trials. In some statistical investiga¬ 
tions the data are expressed in terms of percentages or rates. When 
we may assume a constant probability underl 3 dng th^ frequency 
ratios obtained from observations we have a binomial distribution 
as before but on a different scale. Instead of the variable being x 
it is now x/s. In this case we have 



For the analogous concept relating to the variance we have 
(9) 

\s / 8^ S 

Therefore, we see from (6) and (7) that the number of successes per 
set of 5 trials is distributed about an expected value of sp with a 
standard deviation of {spqY^^, From (8) and (9) we see that the 
percentage of successes in a set of s trials is distributed about an ex¬ 
pected value of p with a standard deviation of {pq/sy^^. 

In . probability theory, the standard deviation is often called the 
standard error. It is important to observe that for a fixed value of p 
the standard error of x about sp increases as s increases and is propor¬ 
tional to whereas the standard error of x/s about p decreases 
as 8 increases, since it is proportional to 

Exercises 

1. Expand the binomial iVd + f)* for 5 = 2 and 5 = 8 . Find the theoretical 

frequencies in each case by taking N as the smallest number necessary to 
express the terms of each expansion as integers. 

2. Find the mean and standard deviation for each of the above distributions 

using the appropriate formulas in (4). 

8 . Find at, 04 for each of the following binomials: 

(i + i)^ (* + ^)^ (i + *)“• 
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4. For a certain binomial distribution 

a *= 2.66, as = 0.318. Find p, g, and s. 

6 . Assume that .04 is the theoretical rate of mortality in a certain age group. 

Suppose an insurance company is carrying s = 1000 such cases. What is 
the expected dispersion (standard error) in death rates from the theoretical 
rate p = .04? What would it be if s = 10,000? 

$• The value of x for which (7(8, a;)p*g*’“* is the largest is called the mode of a 
Bernoulli distribution. Show that the mode is the positive integral value 
(or values) of x for which ^ 

sp — g ^ a; 4 sp + p. 

References: 

1. Mathematical Theory of Probabilities — Fisher, pp. 90-101. 

2. Mathematical Statistics — Rietz, p. 25. 

7. Suppose the law of distribution of the happening of an event in s successive 

trials is given by the terms of the expansion of 

(« + p)‘ = ZC(s, x)p^q'-* = SP,, 

aB»0 

(а) If 8 *= 100 what values of p and g will make Po = Pi; P# = Pio? 

(б) Give approximate values of the P’s in (a). 

8« A bag contains three one dollar bills and four five dollar bills. Three bills 
are drawn at random. For each one dollar bill withdrawn, three two 
dollar bills are returned to the bag, and for each five dollar bill that is 
drawn, a one and a two and a ten dollar bill are returned to the bag. A 
second drawing of two bills is made. Designate by x and y, respectively, 
the values of the first and second drawings, (a) Give in tabular form the 
probabilities for each of the possible simultaneous values of x and y. 
(6) Evaluate E{x) and E{y), 

Solution, (a) The required probabilities are given in the cells of the 
table on page 19. The marginal totals are denoted by g{xi) and 

n m 

h{yj). The fact that "^g{xi) = 1 and ^hiyi) = 1 is a check on the 
In 1 

computations. (6) E(x) = ^Xig{xi) = 26,910/2730 == $9.86, E(y) = 

1 

= 18,120/2730 * $6.64. 

9. A bag contains three one dollar bills and two two dollar bills. Two bills are 
drawn at random. For each one dollar bill drawn two two dollar bills are 
returned to the bag, while for each two dollar bill drawn a one and a two 
dollar bill are returned to the bag. A second drawing of two bills is made. 
Designate by x and y, respectively, the values of the first and second draw¬ 
ings. Give in tabular form the probabilities for each of the possible 
simultaneous values of x and y. Find E{x) and E{y), 

10. For the more advanced student: Read and report on the following article. 
Urn Schemata as a Basis for the Development of Correlation Theory — Rietz, 
Annals of Mathematics, (2), vol. 21 (1920), p. 306. 
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X 

3 

7 


15 

h(Vi) 

20 

1 

n 

12 

0 

18 

1 

4 

3 

30 

35 

u 

35 

35 ' 

78 

■ 

m 

2730 

15 

1 

0 

12 

3 

18 

4 

■1 

m 

120 

35 

35 

78 

35 

78 

35 

78 

2730 

12 

1 

0 

12 

7 

18 

10 

4 

9 

300 

35 

35 

78 

35 

78 

35 

78 

2730 

11 

1 

0 

12 

2 

18 

8 

4 

18 

240 

35 

35 

78 


m 

35 

78 

2730 

10 

1 

6 

12 

3 

n 

■ 

4 

• 0 

60 

35 

78 

35 

78 

m 

9 

35 

2730 

7 

1 

36 

12 

21 

18 

10 

4 

3 

480 

/ 

35 

78 

35 

78 

35 

78 

35 

' 78 

2730 

6 

1 


U 

6 

18 

8 

4 

6 

240 

35 ‘ 

u 

35 * 

78 

35 

78 

35 

' 78 

2730 

A 

1 


12 

21 

18 

10 

4 

_3 

480 


35 ’ 

78 

35 * 

78 

35 

78 

35 

* 78 

2730 

q 

1 


12 

14 

18 

20 

_4 

18 

600 

o 

35 * 

u 

35 

78 

35 

78 

35 

’ 78 

2730 

2 


n 

12 

1 

18 

6 

4 

15 

180 

35 * 

U 

35 ’ 

78 

35 

78 



2730 

?(*<) 

78 

936 

mgm 

312 

1 

2730 

2730 


2730 


14. Approximating the Binomial with the Normal Ctirve. If we 
plot the terms of (g + p)* as ordinates against the values of xjy/s 
as abscissas and draw the corresponding histogram we find that it 
approaches a smooth curve as s is taken larger and larger. Thus in 
Figure 2 (where the vertical sides of the rectangles are omitted since 
they contribute nothing to the interpretation) we see how the stair¬ 
case outline of the histogram approaches close to a continuous curve 
as s is taken larger. 
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The limiting values of as and on for the binomial as s —oo are 

those of the normal cvurve. 
Thus from 



and 


«4 


«s = 

_J_ 
“ spg 

that 


Vspg 


- 5+3 



Fig. 2. Showing Approach op (q + p)* to 
Smooth Curve as « « 


we see that a 3—>0 and 
a 4 3 as s —> 00 . This 
suggests the possibility of 
approximating the binomial 
with the normal curve. As 
a matter of fact, it | can be 
proved, under certain con¬ 
ditions of approximation, 
that (^ + p)* approaches 
the normal curve as a limit 
as 5 00 . The proof * will 

not be given here but a 
word or two about it may 
be appropriate. In using 


the normal curve to approximate the binomial we are particularly 
interested in a range of three or four standard deviations from the 
mean. This fact suggests the reasonableness of assuming that the 
number of successes x' above or below sp be considered as the same 
order of magnitude as <r. This means that x'/{spqyf^ shall remain 
finite as 00 . Now {spqyf^ is of order {sy^^ if neither p nor q 
is extremely small. Hence the propriety of assuming (in the proof) 
that x'/isyf^ shall remain finite. This is the reason for plotting the 
histograms (Figure 2) in terms of x/{sy^^. 

We may expect, therefore, that the fitted normal curve will give 
a fair approximation to the binomial except possibly at the extremi¬ 
ties of the range. When the terms of the binomial are arranged 
symmetrically with respect to the mean, that is, when p » ff, the 
approximation is rather better than otherwise. 


* The following references are recommended: 

Mathemaiical Statistics — HletZ) pp. 32-B5. 

Probability and Its Engineering Uses — Fry, pp. 207-213. 
Annals of Mathematical Statistics^ vol. 1, p. 197. 
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Exercise 

Fit a normal curve to the binomial (J + })“. Diiections: This binomial may 
be written 

2If(x) where /(x) = C(18, *) ^ • 

(See Problem 8, §5.) Next recall that the equation of the normal curve is 

y = —4>{t) 

a 

where 4>(^) = ■ — and t ^ 

v27r 

If we set N =* 1, 2 = sp, and <r = (spqyl^ we shall expect that y will give, ap¬ 
proximately, the values of f(x) for the various values of x. As in Chapter VI of 
Part I the following outline is suggested for organizing the computations. 


X 

t 

4>{t) 

y 

f(x) 







Construct the histogram and draw the curve. It is suggested that paper ruled 
“ 20 to the inch ” be used. By comparing the last two columns and also judging 
from the figure, does the fit seem to be good, even though s is rather small and 
q = ip? 

The above exercise will help the student appreciate a theorem 
which will now be introduced. The sum of successive terms of the 
binomial equals the area of the corresponding rectangles in its histo¬ 
gram. We may obtain an approximation to this sum by finding the 
area under the fitted normal curve which these rectangles occupy. 
Graphically, the values x = 0, 1, 2, • • •, s are the mid-points of the 
bases of these rectangles. Therefore, if we are summing the terms 
of the binomial in which x ranges from x = di to x = ^ 2 , inclusive, 
the corresponding area under the curve will be from x = di — i to 
a: = ^2 + We must convert these values into standard units in 
order to enter a table of areas of the normal curve. Hence we have 
the following theorem. 

Theorem IX.* The sum of those terms of the binomial {q + p)* in 
which the number of successes x ranges from di to d^, incliLsivef is 
approximately 

Q 

* Sometimes called the De Moivre-Laplace Theorem. 
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where 


ti 


di-\ - sp 

_ f 


u 


A + i ““ 

^ 9 


(T = 


Example 2. In tossing six coins what is the probability of obtaining 2, 3, 4, 
or 5 heads? 

Solution. We have sp = 3, <r* = t» di = 2, d 2 = 5. Hence, h == —1.6/(|)^'* 
* -1.225, ^2 = 2.5/(1)1/2 = 2.041. Therefore, 


/^2.041 

pi.225 

/•2.041 

/-1.225 ti 

lo +J 

fo 


.38971 + .47932 = .869. 


Although the use of Theorem IX assumes s large we obtain here with s small a 
good approximation to the exact value Q = f = .875. In this example it would 
have been a simple matter to evaluate and sum the terms of the binomial but 
when a is large and the range from di to di includes many terms this procedure 
may be very laborious. When this is the case the above theorem gives an ap¬ 
proximation which may be quite satisfactory. The approximation is good if 
di lies on one side of the mean and d 2 on the other at approximately equal dis¬ 
tances. 



Example 3. Suppose p = .2 is the probability of success in a single trial. 
Estimate the probability of obtaining less than five or more than fifteen successes 
in fifty trials. 

Solution. The required probability, indicated by the shaded area in Figure 3, is 
P » 1 — Q where Q is the probability of obtaining more than 4 and less than 16 
successes. In using Theorem IX, we have 

ap * 10, <r * 2.828, ti * -1.944, h « 1.944. 


Therefore, 


P « 1 -0 



.0519. 


The exact probability is obtained by evaluating and adding the sixth to the 
sixteenth terms of (.8 + .2)®® and subtracting the result from unity. However, 
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instead of computing these terms separately, a systematic procedure may be set 
up by which each term is made to depend upon the preceding term. Thus we 
may write a binomial as follows: 


(3 + P)* = 3*a +fc)' = 3*{l +«fc + + 


- 1) ,, , s(s-l)(« -2) 


3 I 




•1 


V 


where k — Then g* may be computed by logarithms and its product with the 
Q 

terms in the brackets may be obtained on computing machines by a continuous 
process. Thus for the terms within the brackets, 

the second term is first term multiplied by skt 

s - 1 

the third term is second term multiplied by —^— k » 

8—2 

the fourth term is third term multiplied by —— k » 


the rth term is (r — 1)8( term multiplied by ^^ k 

r ~ 1 


In this way we find Q = .9497, so the required probability is P = .0503. For 
most practical purposes the approximation by use of the Theorem IX would be 
satisfactory. 

Example 4. Find the probability that in throwing 100 coins one will obtain a 
number of heads which will differ from the expected number by less than five. 

Solution. 

.33891 


4 5 


4.5 

.2147 


.2586 




So the required probability is 
given by 

p.9 


Mean 


.1271 

.0451 


Q = 2 = -632. 

1 



_1_ 

_1_ 

.0123 

_1_ 




Chance 


.0027 


Example 5. In the binomial 0 1 2 3 4 5 6 

(.95 + .05)*® where p = .05 is Fig, 4 . First Seven Terms op (.95 + .05)*® 
the probability of success in a 

single trial, find the probability of as many as seven successes. 

Solution. This binomial is too skew for a good fit with the normal curve, so 
the first seven terms of the expansion are evaluated. (See Figure 4.) Their sum 
is .9994 and this is the probability for less than seven successes. Therefore the 
probability for seven or more successes is .0006. 
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16. Simple Sampling of Attributes. It is a matter of common 
experience that certain fluctuations between observation and expecta¬ 
tion under a given hypothesis may be explained on the basis of chance. 
For example, in throwing 100 coins an observed result of 45 heads 
and 55 tails does not warrant the conclusion that the coins are biased. 
In such cases a very natural question arises as to what sampling 
deviations may be allowed before we conclude that they indicate the 
operation of definite and assignable causes, i.e., that the results are 
inconsistent with the given hypothesis. The theory dealing with 
such fluctuations in relative frequencies is called sampling of attri¬ 
butes. 

Suppose we are given a sample of s individuals of which x have 
a certain character or attribute. The question then arises: Is this 
result consistent with the hypothesis that the sample is drawn from 
a population having the fraction p with the given character? Could 
it reasonably have arisen on .the basis of chance or is it significant of 
other than chance factors? In answering this question our common- 
sense judgment is greatly aided by a probability scale for chance 
fluctuations under the given hypothesis. We therefore restate our 
question* more precisely as follows: 

Suppose the probability of an event is known from theoretical 
considerations to be equal to p. What is the probability that in s 
trials the number of successes will differ numerically from the ex¬ 
pected number x = sp by as much as (or more than) an observed 
amount d? 

The required probability may be estimated by means of the fol¬ 
lowing corollaries to Theorem IX. 

Corollary 1. The probability that the number of successes x in s 
trials will differ from the expected number x ^ sp by more than \d\ is 
approximately given bp Pa = 1 — Q« where 

Qs — 2 I <f>(t)dt and 8 -- 

«/o ^ 

CoBOiXABT 2. 1/ the words “ more than ” in Corollary 1 be re¬ 
placed by " as mtich as," then S = -—^ • 

<r 

The proofs are obvious if we admit that the mormal curve fits the 
histogram of the point binomial. 

* See Problems in Sampling — Camp, Journal American Statistical Associa¬ 
tion, p. 964 December, 1923. 
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In another slightly different form involving relative frequencies, 
Qi gives approximations to the probability that the difference be¬ 
tween an observed relative frequency of success x/s and the true 
probability p satisfies the relation 

" Kt)'" 

for every assigned positive value of 8. 




In using Corollary 1, Table 2 gives a general idea of the magnitudes 
of probabilities for certain deviations. It is divided into two sections: 
the first section lists probabilities for specially selected deviations, 
the second section lists deviations for specially selected prob¬ 
abilities. 

A computed probability is used to scale our judgment as to whether 
the deviation in question can be explained on the basis of chance. 

Table 2. Abridged Normal Probability Scale 


Deviation 

a 

Chance of 
Deviation 

Outside dz a 

Deviation 

a 

Chance of 
Deviation 

Outside ± a 

0.6 

.617 

.67 

.50 

1.0 

.317 

1.28 

.20 

1.5 

.134 

1.64 

.10 

2.0 

.064 

1.96 

.05 

2.5 

- .0124 

2.33 

.02 

3.0 

.0027 

2.58 

.01 

3.5 

t 00047 

2.88 

.004 


If it cannot be so explained, it is said to be “ significant ” of other 
than chance causes. In passing judgment on a deviation it is some- 
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times difficult to give a definite answer. Good judgment in these 
matters only comes from much experience in the particular field. 
However, we shall not often be wrong if we draw the following 
conventionalized conclusions about for a deviation outside ±5: 

If ^ .05, 5 is not significant, 

If Pi ^ .01, d is significant. 

If .05 > Pi > .01, our conclusion about 8 is doubtful and we can¬ 
not say with much certainty whether the deviation is significant or 
not until we have more information. 

We see from Table 2 that this rule allows chance fluctuations to 
explain a deviation from the expected value of as much as 2.58 in 
standard units. In some situations it may be desirable to extend 
this range and place the bounds of chance fluctuations at 5 = ±3. 
There is then a correspondingly greater degree of certainty that 
deviations outside these limits are significant. 

Example 6. (Rietz) A group of scientific men reported 1705 sons and 1527 
daughters. The examination of these numbers brings up the following funda¬ 
mental questions of simple sampling. Do these data conform to the hypothesis 
that i is the probability that a child to be bom will be a boy? That is, can the 
deviations be reasonably regarded as fluctuations in simple sampling under this 
hypothesis? In another form, what is the probability in throwing 3232 coins 
that the number of heads will differ from (3232/2) = 1616 by as much as 
d = 1705 - 1616 = 89? 

88 5 /»3.113 

Sdulion, 8 « 3232, (pqsy^* « 28.425, 8 « —^ = 3.113, Pa » 1 


28.425 


1 - .9981 » .0019. 




Hence we conclude that these data cannot be explained on the basis of chance, 
t.e., they are inconsistent with an hyxx>thetical sex ratio of 

16. Probable Error. The word error is technically used in statis¬ 
tics to denote a deviation from the expected value. The deviation S 
for which = .6 is commonly called “ probable error.” This term 
is misleading because it is not the most probable error. Equally 
likdy demfdum would be a more appropriate name for it. 

From the normal probability scale we find that this deviation is 
i = .6745 in standard units or .6745? in arbitrary units. Hence for 
a normal distribution, probable error is equivalent to the quartile 
deviation which, in Part I, we have called E mx units and s in 
standard units. In other words, the probability is one-half that a 
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variate chosen at random will have a value within the range 
E(z) ± .6745<ra.. This definition of probable error combines the 
assumption of a normal distribution with the specification of an 
even wager. 

Used as a scale unit along the a^-axis, probable error is sometimes 
simply defined as a yardstick which is approximately f<r». This 
definition does not impose the condition that the distribution neces¬ 
sarily follow the normal curve. But there is no real gain in the re¬ 
moval of this condition if, for an interpretation of the signficance of 
such a deviation, we must refer to a normal probability scale. That 
is, in testing the significance of a discrepancy between an observed 
value and the expected value there is no merit in expressing that 
discrepancy in multiples of approximately f cr instead of cr itself. It 
would seem that the language of probable error should be aban¬ 
doned. 

17« Standard Error and Correlation of Errors in Class Frequen¬ 
cies. When the probability distribution of a variable is known the 
expected frequency in any class interval may be determined. Sup¬ 
pose we have obtained from a random sample of an infinite distri¬ 
bution an observed frequency distribution. The variates, N in 
'‘number, should be distributed into n class intervals containing /'i, 
f 2 f • • '/'n each. Instead of tins suppose we find /i, fz, • • •/» where 

±r, = N = tf, 

1 1 

Let Table 3 represent the two distributions. 

Suppose next that a large number of such samples of N variates 
each are obtained under the same essential conditions. The ob- 


Tablb 3 


Class 

Class Mark 

Observed 

Frequency 

TheoreHad 

Frequency 

1 

Xi 

/i 

A 

2 

• 

Xt 

/. 

ft 

• 

t 

•j 

Xi 

fi 

fi 

• 

n 

Xn 

fn 

fn 
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served and expected distributions will not agree in practice unless 
the samples are distributed exactly as the universe from which they 
are drawn. In the above table, the x’s are to be regarded simply as 
compartments and do not change. Only the frequencies change 
from sample to sample. Any class frequency /, will vary from 
sample to sample, and these values of f, will form a frequency 
distribution. 

It is important in certain problems to have an expression for the 
expected value of the variance <r/,* of this distribution in terms of 
observed values. To derive this expression we let p, = f',/N be the 
probability that a variate will fall in the class s and g, = 1 — p, be 
the probability that it will fall elsewhere. Then, considering the N 
variates as observations or trials, the theoretical distribution of fre¬ 
quency in this class will be given by (q, + p,)^ and the square of the 
standard deviation of f, in the theoretical distribution is given by 


V/,* = Np^,. 

If we accept the observed relative frequency f,/N as an approximar 
tion to p« then we have 



which reduces to 

( 10 ) •"'-f-i'-N) 

as an approximate * value of the desired expression. 

We will next consider the correlation between deviations from the 
expected values of the frequencies in any two classes, say the sth 
and rth. Let Sf, be a deviation from the expected value or theo¬ 
retical mean of the sth class corresponding to a deviation from 
the expected value of the tth class. Since the total frequency is N, 
IV — is the frequency which is distributed in classes other than the 
s class. If we obtain an excess 5/, in the s class then — 5/, must be 
distributed among the other classes. If deviations from the ex¬ 
pected values are due only to random sampling fluctuations it is 

* When the sample is small, researches have shown that a better approxima¬ 
tion can be obtained by multiplying the right side of (10) by N/iN — 1). See 
Bietz, MathemaUcal Statistics^ pp. 120-122. 
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reasonable to assume that —5/, is distributed among the other classes 
in proportion to their expected frequencies. Therefore, as the con¬ 
tribution from the ft class, we have the proportion ft/{N — /,) and 
the number (—Sf,)ft/{N — f,). 

If the mean value of hft equals -6/, ft/{N - /,), for hf, assigned, 
then —ft/(N — /,) must be the regression coefficient of Sft on 8f,. 
Therefore, 


ft 

N-f. 


tr»f, tfit 

Ustis. -= fftf, - 


so that 


fttSfiiflt, 


ft 

N-f. 




2 


ft 

NiX - V.) 


Nv,{\ 


P.) 



Hence we have the result 


( 11 ) 


ftf. 


'■/«/. = - 


N 

^St^S, 


Clearly, = r/,/„ and o-j/,* = o-/,*, ca/,* = <r/.*, since the 5’s 

measure deviations from their expected frequencies.* 

For an application of the above formula and the Bernoulli Theory 
in general see Thfi Use of Statistical Techniques in Certain Problems 
of Market Research — Brown. Publication of the Graduate School 
of Business Administration, Harvard University, vol. XXII, no. 
3, 1935. 

18. The Poisson Exponential. If p (or q) is small the normal 
curve cannot ordinarily be used with confidence to approximate the 


* The Correlation of errors here is properly a multivariate problem depending 
on the multinomial distribution. The argument given above indicates the 
plausibility of the result but it is not to be construed as a rigorous proof. By 
means of more advanced mathematics the correlation coefficient can be proved 
to have the result found without making use of the assumption that any excess 
frequency in one class is distributed among the other classes in proportion to 
their frequencies. In other words, the assumption is superfluous. 
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terms of the binomial (q + p)*. If s is large but sp is in the neigh¬ 
borhood where x is small, a useful approximation to 


( 12 ) 


fix) = 


! 


a:! (s — a;) ! 




may be given by means of the Poisson exponential function. Sta¬ 
tistical examples of this situation are sometimes called rare events 
and occur in widely different fields; for example, the number born 
blind per year in a large city, the number of organisms of a given 
size /S on a given glass slide that escape death by X-rays after being 
exposed for t seconds, the number of times in a certain year that the 
volume of trading on the New York Stock Exchange exceeds M 
million shares, the frequency of certain “ peaks ” in a given time 
interval such as occur in telephone “ traffic,” and other problems in 
demands for services. 

Suppose, then, that p is the probability for the occurrence of 
the rare event in question and assume that g = 1 — p is nearly 
unity. Let s be so large that s ! and (s — a;) ! may be replaced 
by their Stirling approximations [cf. (12) of Chapter II]. Making 
these replacements, (12) becomes 

s.+l/2e-.p«g.-« 

^ ^ a: !(s - x)-»+‘/*e-+*‘ 


Writing the second factor in the denominator of (13) in the form 
it is readily seen that (13) becomes 

(sp)*e“®(l — p)*”* 


Six) = 




Now when x is small and s is large,* 


and 


(1 - p)» 


(1 - p)* 


* The symbol « is used to mean “approximately equal.” 




give the probability of exactly 0, 1, 2, • • •, or x occurrences of the 
rare event in s trials. It is worthy of note that the Poisson expo¬ 
nential has only one parameter, m, whereas the normal curve has two 
parameters, the mean and o-. 

Certain simple and interesting results may be obtained for the 
moments of the distribution given by (14) when x takes all integral 
values from a; = 0 to x = s. First we observe that when x = a 
in (15) we have 

in “ ^ 

»-0 ^ * 


approximately if s is large. Then 
E{x) = vi = xf{x) 

a:»0 

SB me~”e’^ 

= m = sp, approximately. 

And 

Vi = 

0 

= ~ 1 ) + 

0 

= m(m + 1), approximately. 
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From these results, we have 
Mean ^ m — sp 
p% = m{m + 1 ) — 

= m. 

/. <r = {myiK 

It may also be shown that 

vz = m(m* + 3m + 1) 

= m(m* + 6m2 + 7m + 1) 

whence we find that 

Ms 

and 

as 

It is a rather striking result that each of the mean, variance, and 
Ms is equal to m. 

The importance of the Poisson approximation in dealing with 
certain problems in telephone engineering and other fields is dis¬ 
cussed in Fry’s book, Probability and Its Engineering Uses. The 
interested student might investigate and prepare a special report on 
some of these applications. 


m 

3m* + m 


1 

ml/*' 


04 = 3 +• 


m 


Problems 

1. Use Theorem IX to approximate the following sums: 

(o) the terms of (i + i)®® in which 50 ^ a; 70. 

(6) the terms of (.946 + .054)“^ in which x ^ 34. 

2. Fit a normal curve to the point binomial (J -|- i)®. 

3. Fit a normal curve to (J + D*. 

4 . Suppose you are studying IQ's and it is known that 20% in the universe with 

which you are dealing have an IQ below Af, so that i is the probability 
that an individual chosen at random has an IQ below M. (M itself has 
no bearing on the solution of the problem.) If a teacher had a class of 
fifty which could be regarded as a random sample from this universe, 
would it be exceptional if she foimd less than five or more than fifteen 
with IQ's below Af? (See Example 3.) 

6. Vital statistics gathered over a long period of time indicate that 5% of 
patients suffering from a certain disease die from that disease. Suppose 
that out of 30 cases examined in a certain city seven deaths were re¬ 
ported. Was this unusual? (See Example 5.) 
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6 . {Camp) A dean’s report showed the following figures: 


Subject 

Honor Grades 

Failures 

Number 

Number 

% 

Number 

% 

Examined 

German 


36 

33 

6.3 

521 

Mathematics 


35 

38 

8.2 

466 

Music 

11 

50 

0 

0.0 

22 


All Subjects 38 5.4 


Taking p = .38 for honor grades and p = .054 for failures find the prob¬ 
ability : (a) that in selecting at random (from a supposedly infinite num¬ 
ber), one would obtain as few honor grades as were obtained in German; 
(b) as many failures; (c) in selecting 466 at random, one would obtain as 
few honor grades as were obtained in mathematics; (d) as many failures; 

(e) in selecting 22 at random, one would obtain no failures (as in music); 

(f) eleven or more honor grades. 

Hints, (a) Find sum of terms of (.62 + .38)“^ in which x ^ 187. 

(5) See Problem 1 (6) above. 

(e) Evaluate (0.54)2* logarithms. 

7. (Burgess) If analyzed past experience shows that 4% of all insured white 

males of exact age 65 have died within a year, and it is found that 60 of a 
similar group of 1000 actually die within a year, should the group be re¬ 
garded as essentially different from the general mass — that is, is the 
departure from the expected mortality greater than might be expected as 
a result of chance variation alone? 

8. (Richardson) In a coin tossing experiment in which a coin was tossed 400 

times, 250 heads appear. Do you believe the experiment was honestly 
performed? 

9. (Lovitt and HoUzdaw) Would you be willing to bet 10 to 1 that an opponent 

could not throw the sum 7 with two dice at least 23 times in a hundred 
throws with two dice? 

10. (Lovitt and HoUzdaw) The 1919 report of the Census Bureau in its bulletin 
on Mortality Statistics shows the average death rate from tuberculosis (all 
forms) for the period 1906-1910 to be 163.5 per 100,000 of population 
and (T = 12.78. 

In the following Instances is the variation from the average such as to 
justify one in constructing a theory as to the causes of this variation? 


California 

210.4 

Colorado 

244.2 

Michigan 

99.7 

N. Y. Bronx 

445.7 

Scranton, Pa, 

97.4. 
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11. A sociologist who is interested in the characteristics of a certain race which 

we will call R, hit on the idea of trying to sort /2’s from non-/2’s in the 
writings of unknown persons. Accordingly he persuaded a colleague to 
let him have 64 examination papers, with names removed, from psychology 
classes at Blank University. On 43 of these papers he correctly spotted 
the students as 12^s or non-jR’s. In 21 cases he missed. Find the prob¬ 
ability of this performance having resulted from pure chance. 

12. A coin is tossed a times. It is desired that the relative frequency of the 

appearance of heads shall not be greater than .51 or less than .49. Find 
the smallest value of a that will insure the above results with a degree of 
certainty Qh ^ .90. 

Solviian, We must determine a such that Qa » .90 (at least) that 

X 1 

(?)■'■-“ 

« = .02 

2j'\(t)dt = .90 



We have 


once p » g « Also 


whence from the tables, we find B — 1.645. Therefore, 


and 


.02 VJ » 1.645 
a = 6745. 


13. A coin is tossed a times. It is desired that the relative frequency of the 

appearance of heads shall not be greater than .502 or less than .498. Find 
the smallest value of a that will insure the foregoing results with a degree 
of certainty Qa ^ if* 

14. (Camp) A census report showed that in general 59.58% of New York City 

children went to school, but that only 56.8% of the negro children went 
to school. The number of negro children was 20,000. Was the difference 
due to chance? 

16. Read and give a report on the reference given at the end of § 17. 

16. Find applications of the Poisson exponential function in the literature and 
report on them in class. 



CHAPTER II 

SOME USEFUL INTEGRALS AND FUNCTIONS 

To avoid interruption later on we will discuss here certain integrals 
and functions which will be useful in subsequent chapters. 

1. The Gamma Ftmction. The improper integral 

( 1 ) r(n) = f n > 0 , 

V 0 

is called the Gamma fimction of the positive number n. The differ¬ 
ence equation 

( 2 ) r(n + 1 ) = nr(n) 

is easily established from ( 1 ) by integration by parts (see the chapter 
on the Gamma function in any textbook on advanced calculus). 
By successive reduction of (2) we obtain 

r(n + 1) = — 1) • • • (n — k)T{n — k) 

where fc is a positive integer less than n. If n is also a positive 
integer and fc = n — 1 then we have 

(3) r(n + 1 ) = n ! 

since from (1), r(l) = 1. Because of (3) the 8 
Gamma function is sometimes called the fac- ^ 
torial fimction. It may be considered as a g 
generalization of n 1 when n is fractional. The 
graph of the function defined in ( 1 ) is shown ® 
in Figure 6 . It can be drawn from the following ^ 
values, some of which follow immediately from 3 
( 2 ) and the others will be established later. 

r(0) = 00 r(2) = 1 . 

r(i) = 1 r(3) = 2. 

r(i) « r(4) * 6. 

35 
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Other forms of (1) may be obtained by changes of variable. For 
example, 

(4) r(n) - 2 f dy, by x = yK 

Jo 

From this form we can show that 

( 6 ) f dy = 

Jo 

To establish (6) we first observe from (4) that 

(6) r(i) = 2/ e-y^dy. 

Jo 


Since (6) is independent of the variable of integration, we may also 


write 



r(i) = 2 J e-»*dx. 

So 

[r(i)]* = 4 rV*dy 


•/o ^ 0 

(7) 

= 4 r f dx dy, 

Jo Jo 


the passage from the product of two integrals to the double integral 
being valid since neither the limits nor the integrand of either integral 
depend on the variable in the other. 

To evaluate (7) it will be convenient to change to polar coor¬ 
dinates. First, however, we will make a few remarks about a cb&nge 
of variables in general. Let x and y be the coordinates of a point 
with respect to a set of rectangular axes in a plane, u and t* the 
codrdinates of another point with respect to a similarly chosen set 
of rectangular axes in some other plane. Suppose we have a functidn 
of the variables (*, y), 

* =• /(*, y), 


and we make x and y depend on new variables u and v by the relar 
tions 


X = g(u, v) and y = h(u, v). 
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These relations establish a certain correspondence between the 
points of the two planes. Let dA be an element of area for the 
function/(x, y). Then it is shown in advanced calculus * that 

dA = I J I du dv 

1 \w, vj I 

I /xj 2 /\ 

J(-) is a convenient symbol for the absolute value of 

\u, vj\ 

the determinant 


dx 

dx 



du 

dv 

dx dy 

dx dy 

dy 

dy 

du dv 

dv du 

du 

dv 




and the latter is called the Jacobian or functional determinant of the 
transformation. 

If, then, we change (7) to polar coordinates by letting 


( 8 ) 

the Jacobian is 


X = r cos 9 
y — r sia 9 


cos 9 
sin 9 


—r sin 9 
r cos 9 


Therefore, the element of integration dx dy becomes r dr d9. The 
limits of integration are now from 0 to oo for r and from 0 to ir/2 
for 9. From (8), + y^ = So (7) becomes f 


= 4 / / 

«/o 


g-r2 y. dr do 


= 2 



TT. 


* See Mathematical Analysis^ Goursat-Hedrick, vol. 1. 
t The transformation to polar codrdinates and subsequent in¬ 
tegration involves a remainder term T which is the integral 
over an area between a quadrant of radius R and a square of 
side R, But it can be shown that T 0 as R-* (Cf. 

Wilson’s Advanced Calctdus, p. 364.) 
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Hence, 

( 9 ) 




and from (9) and (6) we obtain (5). 

For a more general form of (5) we may let y = t/{2ky'*, k> 0, 
and obtain 


( 10 ) 

and 

(10a) 


r"e-«2/2» at = §(2irft)i'*, 

Jo 

J dt = (^irft)*'*. 


An alternate derivation of (9) may be given as follows. The right- 

hand member of (7) repre¬ 
sents the volume V under 
the bell-shaped surface 

( 11 ) 2 = 

and so from (7) we have 
r(|) = Since (11) is 

a surface of revolution we 
may take as the element of 
volume a cylindrical shell 
of radius r, thickness dr, 
and height z. Then 

2rr dr z — dr, 

2v f e“^r dr = ir, 

and consequently we obtain (9). 

2. Stirling’s Approximation. An asymptotic expression, that is, 
an approximation with small percentage error, may be obtained 
for n I when n is large. The following formula 
(12) n I = (2Tr)i/2n»*+^'*a^ 

is called SHrling^a approximation. A closer approximation is 

1 



dV 

V 


n 1 


(2ir)‘'*»»+i/*e-» ^ H-) • 


However, the first term usually gives sufi&cientlyrclose approxima* 
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tions if n is fairly large. A derivation of (12) may be found in 
several places. Among these are 

Probability and Its Engineering Uses — Fry, D. Van Nostrand Company; and 
IntrodiLCtion to Maikemalical Probability — Uspensky, McGraw-Hill Company. 
Seven-place tables of log ti ! up to n == 1000 are given in Glover*8 Tables. 

80 The Beta Function. The definite integral 

(13) B(m, n) = f dx 

Jq 


is called the Beta function of any two positive numbers m and n. 
Another useful form is 

j r.ir/2 

gin2m-l 0 c 082"“1 6 dd 

0 

which is obtained by letting x = sin^ 6 in (13). 

If we let a: = 1 — y, (13) becomes 


B(m, n) = r (1 - 1 

= / (1 — ( 


= B(n, m). 

Therefore, m and n may be interchanged. 

A relation between the Beta and Gamma functions may be ob¬ 
tained as follows. From (4) we may write 

0 

— 4 r r ^y- 

J 0 V 0 

Since the region of integration is the first quadrant of the xy-plane 
we have, upon changing to polar coordinates, 

I ,. 2 <m+n-i)g-f» ff co 8 *"~‘ Ov dd dr 

0 

j f%rl2 ^00 

sin2*«“l 0 0 do I r^(m+n) 

0 •'0 
= B(m, n)T(m + n), 
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by (14) and (4). Hence 


B(m, n) = 


r(m)r(n) 
r(m + n) 


4. Reduction to Gamma and Beta Functions. By appropriate 
changes of variables many of the integrals that occur in statistics 
may be evaluated by expressing them in terms of Gamma and Beta 
functions. 


Examples 


(a) Prove that 




Solution. This integral may be written 


By the substitution 


this becomes 




d(y*)^ — dx 


Kv)"H0 




(6) Determine k so that 


(*>)(»-»>/* d(.»*) = 1 . 


Solution, By the substitution 


this becomes 




/2<r‘\/*“ , . 

\if) X 
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(c) Determine K so that K I (14* (te = 1 


/: 


Solviion, By the substitution z ** tan d this becomes 2K 
m - iV^ — 2. From Exercise 9 below we find that 


i 


»/2 

cos” B d$ where 


whence 




6. Incomplete Beta and Gamma Functions. The integral 

(16) r,(n + 1) = f dx 

is called the incomplete Gamma function. Similarly 

(17) B»(m, n) = f ~ x)^~^ dx 


is called the incomplete Beta function. Both (16) and (17) are 
useful functions in mathematical statistics and they have been 
tabulated by Karl Pearson and his staff at the Biometric Laboratory, 
University College, London. They are published by the Cambridge 
University Press. 


Exercises 


1. Show that the Gamma function becomes infinite when n 0. Hint, From 
(2) you can obtain 

r(n 4 A;) « (n 4 I; - 1) • • • (n 4 l)nr(n), 

that is 

. . _ r(n 4 fe) _ 

’■n(n41)-“(n4A;-l)’ 

X es j 

4>{t) (ft = 1 where ^(0 = — 

3. Prove that r(J) « 
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4u Evaluate 


D- 


3*(1 -f x/2y dx by transforming it into a Gamma function. 


Hint, Let cy = 1 + xl2 and determine c so that 
Ana, (c«7 !)/3(67). 


6. Evaluate j — 6)^ dx, Ans, c’‘i*2“«7 !. 


6. Evaluate 


X‘- 


*35^^* dxj given that 


J ^09 
0 


t'*dx = 


Hint. r(t)=ir(l). 

7. Find the difference and the ratio between the exact value of 10 I and the 

approximate value obtained by using Stirling’s formula. 

8. Using (15) show that 


(T)‘«r 




Bim -1), ii 


X ir/2 

COS*" a dd = i B[(m + l)/2, J]. Hint, Use (14). 
10. Given that/(n) * n^^‘B(n/2, i), show that lim/(n) = (2ir)if*. 



CHAPTER III 


GENERAL CONCEPT OF DISTRIBUTION FUNCTION OF A CONTINUOUS 
VARIABLE. GENERALIZED FREQUENCY CURVES 

1. Fundamental Notions and Definitions. The notion of distribu¬ 
tion functions relates to theoretical universes. The concept is an 
idealization of observed distributions comparable to the idealization 
of the outlines of material objects into the straight lines and circles 
of geometry. 

A continuous variable x is said to have the distribution function 
fix), which we take to be single-valued and non-negative, if the 
frequency of occurrence of x in the range a < x <b is measured by 

(1) dx. 

If X has the distribution function f(x) with total frequency N, then 

(2) m dx = N, 

and y = f(x) is called a theoretical frequency curve or, more briefly, 
a frequency curve. If the actual occurrence of the variable is limited 
to a finite range, f{x) is defined to be identically zero outside that 
range. If the total area under the curve is taken as unity, so that 

(3) f’°f(x)dx^l, 

then y = f{x) is variously called the probability density, the proba¬ 
bility distribution, or the probability function of x. Then, f{x) dx 
gives, to within infinitesimals of order higher than that of dx, the 
probability that x lies in the interval (x, x + dx). Under condition 
(3), the integral (1) denotes the probability that x lies in the interval 
(a, 6), Under condition (2), (1) denotes the frequerwy of values in 
the interval (a, 6). A distribution function can be regarded, there¬ 
fore, either as a frequency curve or as a probability curve according 
as condition (2) or (3) is imposed. The distinction can be adjusted 
by determining appropriately a constant factor in 2 / = f{x). 
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2. Moments. If x is distributed in accord with the frequency 
curve y « /(x), with total frequency the moment of order k about 
the ^-axis is defined by 

( 4 ) V* = **/(*) dx. 

In particular, for A: = 1 we have the mean, vi = 2, 

1 /•“ 

*=-J */(x)<fx. 

If the mean is taken as the origin of measurement, so that 

J (x - x)/(x) dx = 0, 

then the moment of order k about the mean is defined by 

(6) *^*-^(*^ 

In particular, when fc = 2 we have the variance, — or*, 

c^^j^f\x-xmx)dx. 


^ The m's can be expressed in terms of the y’s by the relation 
(6) ii.k= n — C(k, + C(k, 2)>'*_svi* — • • • 

+ (-l)-C(i:, r)vk-rPi' + • • • + (-1)*-‘[C(*, A! - 1) - l]^i» 

whe™ 


In particular, the following relations are useful in computations: 


(7) 


/i2 = >^2 — Vl^ 

Hz = Vs ^V2Vi + 2vi^ 

M4 = *'4 — 4v8J'1 + 6v2Vi* — Zvi^. 


The first of (7) is proved below and the others may be established 
in a* similar way. 


1 

“ xf I da; — X* = V2 — vi*. 

iV •/ ^00 



Generalized Frequency Cunres 45 


In standard units the moment of order k is defined by 


( 8 ) 


where 

(9) 

From (8) we have 



{x — «) 


«0 = 1 , 
ai = 0, 

a2 = 1. 


Analogous definitions of moments could be given for probability 
functions. When iV = 1, in accordance with (3), the integrals in 
(4) and (5) are also called expected values. The language of expected 
values will be used in another chapter where we will be dealing more 
with probability functions. Before proceeding with the discussion 
of frequency curves, however, we will give an example of a proba¬ 
bility curve. 



Example, The Cauchy curve is a classical example of a probability distribu¬ 
tion although its use in present day statistics is relatively unimportant. Its 
equation is 


( 10 ) 


V 


b 

ir(b* -f- X*)' 


— 00 S X ^ 00 , 


The curve is symmetrical having its center at x * 0. 


6 > 0 . 
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A simple derivation of this function is as follows. For a given real constant 
b locate the point (0, 6) as in the figure below. Let lines be drawn at random 
through (0, b) and let 0 be the variable angle between any such line and the 



contained between 0 and d dd is dd/ir, 
between de and dx to be 


negative direction of the y-axis; 
$ varies between the limits — ir/2 
and ?r/2. The hypothesis is that 
all values of d in this range are 
equally likely. Denote the inter¬ 
cepts on the horizontal axis by x. 
Clearly, — oo < x < «. The re¬ 
lation between d and x is 

X 

d = tan^^r* 

0 

Under the hypothesis, the prob¬ 
ability that an angle Obx will be 
By differentiation we find the relation 


( 11 ) 


do __ b dx 

^ ““ 7r(b^ -f X^) ’ 


Therefore, the points of intersection of the lines with the x-axis are distributed 
so that the probability that a value of x will fall in the range dx is given by the 
right-hand member of (11). Hence the probability function for the variable z is 


fix) 


b 

ir(6* + X^) 


and the probability that x lies in a finite interval (c, d) is given by 

b dx 

[ 


\ 


since the integral of the right-hand member of (11) from — oo to oo is equal to 
unity as can easily be verified. However, we cannot speak here of the mean 
value of x or of moments of higher order, since the integral 

J^dx 

• ( 6 * + x^) 

has no meaning for A; ^ 0. This restriction does not apply to probability func¬ 
tions in general. 


3. The Pearson System. There are two systems of generalized 
frequency curves in common use: the Pearson system and the 
GrantrCharlier system. 

During the years 1895-1916 Karl Pearson published papers in 
which he showed that a set of frequency curves could be obtained 
by assigning values to the parameters in a certain first order differ¬ 
ential equation. The Pearson school claims that all the different 
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Integrating the left-hand side by parts, we obtain 

(14) t*(a + bt + - Jy[akt^^ + b{k -1- l)t» + c(A: + dt 

= J'y(mfi — <*+*) dt. 

If vanishes at the ends of the range, then the first expression 
in (14) vanishes. If, in (12), y = h{t) we Have from (8) and (14), 

(15) mak + akak-.i + b(k + l)ajfe + c{k + 2)aiH-i = «*+!• 

Assigning k successively the values fc = 0, 1, 2, 3, we obtain from 
(15) the four equations 

m + b =0 

CL -f* 3c = 1 

m +35 + 4ca3 = 0(3 

lUaz + 3a + 45a8 + 5ca4 = 

from which the parameters can be determined. Solving (16) we 
obtain 

m -^[az{Z 0 ( 4 )] 

a = ^ [daz^ — 4a4] 
b =;5[-«»(3 + «4)] 

c =i[6 + 3a3*-2a«] 

D = 18 + 12a8* - 10a4. 

Carver* has expressed (17) in the more convenient form 

«8 ,_ otz 

2(1 + 2fi) ’ “ 2(1 + 2d) ’ 

2 + S S 

2(1 + 2«) ’ ® “ 2(1 + 25) ’ 

^ 2a4 — 3a3^ — 6 

5 =r -- 

a4 + 3 

* See the Handbook of Mathemalical Statiatice — Rietz et oZ. 
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Substitution of the above values into (15) yields an important 
recursion formula for the moments of the Pearson system; 

k 

^^19) afc+i = 2 — (fc — 2)5^^^ + aaajb]. 

For our purposes the most important curves in the Pearson system 
are the Type VII (normal curve) and Tjrpe III. These will now 
be discussed in some detail. 

Type VII, If as = 0 = 5, then (12) becomes 

— = -tdt 

y 

which upon integration yields the so-called normal curve 

( 20 ) = -oo<t<oo. 


The constant C may be determined so that the area under the curve 
is N, Imposing this condition and making use of (10a) of Chapter II 
we find that C = N/(2Tryf^, and so (20) becomes 


y = 


N 

(27r)i/2 




It is conventional to write this in the form 


( 21 ) 

where 


!/=-<!>«) 

(T 

= —3^— e-filt 
(air)i'* 

^ _ (X - It) . 

<r 


We may call 0(<) the normalized normal curve. 

Type III. If 5 = 0 but as 0 we see from (18) that (12) 
becomes 

d, 

<“ i+|-, 

which upon integration yields the Type III curve 

(22) y = j:(il + 
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where A — 2la», the range being {—A, «). The criterion for a 
Type III curve is that 6 = 0. That is, if a Type III curve is to 
represent an observed distribution the observed moments should 
satisfy, at least approximately, the relation 

204 — Saa* — 6 = 0. 


Definitions of moments of an observed distribution are given in 
Part I. 

The constant K in (22) may be determined by the condition 



This integral can be evaluated by means of the Gamma fimc- 
tion. Let A^ = n/2 and let il(A + 0 = xV2. Then we have 
(A + 0^*-* = = e»/*e-*’/*,andd< = d(x*)/2A. 

Making these substitutions in (23) we obtain 




and therefore 





So with as the independent variable, (22) becomes 



When iST = 1, (22a) defines the probability distribution of x*- This 

is an important function which 
we shall use in subsequent dis¬ 
cussions. 

The designation Type III 
is usually restricted to the case 
for which 1. When A* > 1, 
that is, when [asl < 2, the curve is 
“f— bell-shaped as shown in Figure 10. 
Fig. 10. Ttpb III Curve WHEN |a,|< 2 1“ the PearsOn system, the 

distance between the mean and 
mode is m = — 03 / 2(1 + 26), and is a measure of skewness. 
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Under the conditions imposed for Type VII, m = 0. For Type III, 
however, m = —azl2 and therefore we have 

[mean — modej = 

Because of this relation, la3|/2 is sometimes used as a measure of 
skewness in observed distributions. The curve for as = (fc = 
a constant) is a reflection of that for as = fc through the line t = 0. 

When < 1, that is, when jas] > 2, the curve is J-shaped with 
an infinite ordinate at ^ = —A. 

The special case for which = 1 is known in the Pearson system 
as Type X. When as = ±2, (22) becomes 

y = Ke^K 

This is also known as Laplace^s second frequency curve. 

Tables of ordinates and areas of the Type III curve have been pub¬ 
lished by Salvosa in the Annals of Mathematical Statisticsy vol. 1, no. 2. 

A systematic treatment of all the curves in the Pearson system 
has recently been given in a paper entitled A New Exposition and 
Chart for the Pearson System of Frequency Curves by C. C. Craig, 
Annals of Mathematical Statistics, vol. 7, no. 1, pp. 16^28. 

4. Genesis of the Pearson Curves in the Theory of Probability. 
The differential equation (12) is supposed to have some support 
in the theory of probability. This claim rests on the assumption 
that the distribution of statistical material may be likened to a priori 
distributions in certain urn schemata. The method by which (12) 
is associated with underlying probabilities is started by considering 
the following problem. 

An um contains n balls of which np are white, so that the proba¬ 
bility of drawing a white ball in a single trial is p. The rest of the 
balls, ng, are black, and the probability of failure to draw a white 
ball in a single trial is g = 1 — p. If 5 balls are drawn from the 
um one at a time with replacements after each draw, what is the 
probability, B{x)y of drawing exactly x white balls and (s — x) 
black balls? 

From the Bernoulli theory it is known that the probabilities of 
getting X = 0, 1, 2, • • •, s, successes in s trials are given by the suc¬ 
cessive terms of the binomial 

(24) (g + pY = YiB(x) 
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B{x) = C(s, x)p*g*-’‘. 

Bepresenting the terms B(x) by ordinates y*, one may plot the 
(s + 1) points (x, yx). Through these (s + 1) points one may 
imagine a curve that can be represented by an anal 3 rtic function. 
Since 

yx = C(s, x)p‘(f-’ 

and hence 

= C{s, X + 

we have 
(25) 

y» qx + q 

From (25) we obtain 


(26) 


- Vx ^ sp - q - X 

2/x+i + yx sp + g + (g ~ p)x 


Now the mean of any two ordinates and 2 /*+i) may be considered 
as approximately equal to the ordinate ( 2 / 04 . 1 / 2 ) midway between 
them. The slope of the line joining any two points {x, yx) and 
(x + 1 , yz+i) is also approximately equal to the slope of the tangent 
at the point midway between these two points on the continuous 
curve. Under these two assumptions, (26) may be written as 

( 27 ) ^ 2(sp -q-x) ^ 

y.+i/* 8p + q+ (q- p)x 


The right member of this equation is, therefore, the derivative 
of log y at the point (i + i, y»+i/»). At any point (x, y.) this deriva- 
tive is 


<fGog y) ^ 2{8p - g - (a; - i)} ^ 

dx «P + « + (3 - p)(» - i) 


If P “ 3 “ then (28) becomes 


£ 

dx 


(logy) 


-(*-0 

A + i\ ’ 
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which is of the form 

dy {x — m)dx ^ „ 

— -- f a > 0, 

y a 

and which, upon integration, yields the normal curve 


(29) 

where 



The next step consists in dealing with the case p 5 ^ g. From (28) 
we have 

-^(logj/) I (x-ap)(g-p) 

W+J +-5- 

If we set 

= ispqy^ 

the above equation becomes 


X — sp 
(spg)i/* 


^+t 

d „ . 2 

-j.Gogy) ---7“' 

^ l+?< + 7^ 

2 4spg 


If spq is so large that l/4sp^ is negligibly small, (30) becomes 


d(log y) 
dt 


i+f« 


which upon integration yields the Type III curve. It is evident 
from (31) that this curve approaches the normal curve as a limit as 
0(8 —> 0, 

With p = q, (28) is of the form (12) when b = c = 0. With 
P9^q, (28) is of the form (12) when c = 0. To produce, in the 
theory of probability, an expression comparable to (12) when both 
b and c are different from zero it is necessary to consider a more 
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general um problem. So far the underlying probability, p, has 
been constant. If we consider the um schemata previously described, 
but remove the restriction of replacements, then the chance of success 
is not constant from trial to trial but depends upon the results of 
previous trials. Thus, without replacements, the chances of obtain¬ 
ing exactly x = 0, 1, 2, • • •, s white balls in a draw of s balls, are 
given by the successive terms of the hypergeometric series 

(32) — {C(np, 0)C(n?, s) + (7(np, l)C{nq, s - 1) + 

* • • + C(np, x)C(nqy s — x) + • • • + C(np, s)C(nqy 0)} 
in which the general term is 

rr( V ^_ (np) ! jnq) \ s \ (n - s) ! _ 

(np — x) ! (nq — s + x) ! n ! x ! (s — a;) ! 


By representing the terms of this series as ordinates of a frequency 
polygon, it is possible to show that* the slope, at the mid-point of 
any side, divided by the ordinate at that point is equal to a fraction 
whose numerator is a linear function of x and whose denominator 
is a quadratic function of x. It is clear that (12) gives a general 
statement of this property. 

Since the hypergeometric series is associated with (12) and the 
Bernoulli series is associated with a special case of (12), viz.j when 
c = 0, we should quite naturally expect that the Bernoulli series 
is a special case of the hypergeometric series. Writing H{x) in the 
form 


H{x) = 


a; ! (5 ~ x) ! 
p\p - 1/n] • • - {p- (x+ l)/n}q{q - 1/n] 


{q- (s-x- \)/n] 


{1 — 1/n) • • • {1 - (x + l)/nj{l — x/n) • • • {1 - (s - l)/n} 


it is obvious that 

Lim J?(x) = C(s, x)p*g*”* = B(x). 


When n = 00 , there is an infinite supply in the um, so the proba¬ 
bility, p, remains constant from trial to trial without replacements. 
In other words, sampling from a finite supply with replacements is 
the same as sampling from an infinite supply without replacements. 

* See 1. Elderton, Frequency Curves and Correlation, 

2. Bietz, Maihemaiical Statistics (Cams Monograph)^ Chapter III. 
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6. Further Discussion of the Normal Curve. We will now return 
to a discussion of the normal curve, giving some proofs which had to 
be omitted in Part I, and supplying explanations which in one 
instance or another perhaps had to be read between the lines there. 

A. Fitting the Curve, If (29) is to represent an observed distri¬ 
bution, the parameters m, a, and k may be determined by the principle 
of moments. Equating the fcth functional moment to the kth 
moment of observed data, for fc = 0,1, 2, we have three simultaneous 
equations 


(33) 


k f dx=- N 

— CO 
09 

fc I xdx = NZ 

t/ ~ 00 

k I X® dx - NVi 

«/ — 00 


in which the parameters are the unknowns. 

The solution of these equations can be made to depend upon the 
integral 

(34) f e-y^f^ dy = (2Tayf^ 

nJ — 00 


which is evaluated in Chapter II. Using this result and letting 
y == X -- nij the first of equations (33) becomes 

(a) ki2wayi^ = N. 


The second becomes 



/ CO 

0-v^l2a dy = 


Nx, 


In the above relation, the first integral vanishes because the inte¬ 
grand is an odd function. So, using (34), we have 

(6) fcm(27ra)^'2 = Nx. 

The third integral in (33) may be written in the form 

k f dy + 2km f e^y^^'^'^y dy + km^ f e-^f^ dy., 

«/.^aQ O ^ CO tZ — oo 

Upon integrating (by parts) the first integral in the above expression 
and evaluating the other integrals, we obtain 

(c) kV2Tra(m^ + a) = Nv 2 . 
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From (o) and (6) we find m = x. From (o) and (c) we have 

m* + o = 

and so 


a = /U 2 = 


Therefore, (29) becomes 


(36) 


y = 








B. Moments, The general moment of odd order of (35) about 
the mean is given by 


1 ^00 

At 2 Jb-M = dx. 


But the right member vanishes because the integrand is an odd 
function. Therefore, all moments of odd order of the normal curve 
taken about the mean are zero. 

The general moment of even order is 





^ dx. 


Integrating the right member by parts, letting w = (a: — the 

following recursion relation is obtained for even moments 

(36) |i2. = (2ft - 

Then when fc = 1, m 2 = when fc = 2, = 3 m 2^; etc. 

A recursion formula for the moments in standard units may also 
be obtained from (19), Under the conditions imposed for Type VII, 
(19) becomes 

ajb+i = kajr-if ft = 1, 3, 5, • • • . 


a2 = 1 
a4 = 3 
ae = 1 • 3 • 5 

aa* = 1 • 3 • 5 • • • (2* - 1) 

^ (2fc) ! 

^ 2*fc ! * 


Hence, 
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C. Quadrature, Some writers use the term quadrature for the 
evaluation of an integral. The definite integral 



is commonly called the probability integral. Clearly it is a function 
of the variable limit t. Although (37) cannot be evaluated in finite 
form, it can be computed by expanding the integrand into a power 
series and integrating as many terms as may be needed. 

In (37) let y = hx. When x — ty y = ht. So (37) becomes 

(38) m = dy. 

Expanding the integrand of (38) we have 


e’-^ = 1 


v^ + — 
^ 2 ! 


3 ! 


+ 


+ 


y 


2n-2 


(n - 1) ! 


+ 


Termwise integration yields the result 


(39) f er^ dy =-^\y - 

y/TJo V’T [ 


y^ , y' 


w 






10 42 216 


>R < 


1320 


This series converges for all values of y, and the error made in stopping 
at any term is numerically less than the first term neglected. For 
small values of y it converges rapidly and is a satisfactory method for 
computing when y ^ 

But for large values of y, (39) converges too slowly to be practical; 
too many terms are required. It is therefore important to obtain 
an expansion in descending powers of y. To this end write 


(40) 


r dy = f e'-^ dy — f e^ dy 
Jo Jo Jy 

r 

2 X 




j e-^ dy f -ye-^^dy, 

y J y y 


and 
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Integrating the last integral by parts we obtain 


2yi 




Integrating successively by parts gives the result 
(41) l-i- + -2--^5 + 

J, iy‘ Sf 

From (40) and (41) we have the final result 


(42) 


where 


1 r -4 j ^ «■*’ f, 1.3 3-6 , 

Vi Jo ® ^ ■ 2yViV 2y*'^4y^ 8y» + 

• • • + (-i)»rH.i + 


Tn^\ = 


1>3»5 > > « (2n - 1) 

2ny2n 


and (n + 1) is the number of the term. The series in (42) is 
called an asymptotic or semi-convergent series; it converges until a 
minimum term is reached and then diverges. The general term 
r«+i decreases so long as n ^ y\ But after the integrations by 
parts have been performed so many times that n > y^^ Tn+i increases. 
Of course the integrations should not be carried further. The value 
obtained by using the series in (42) will differ from the true value by 
less than the last term retained. 

Tables of (37) may be computed by means of (39) for y(= hi) 51 
and by (42) for y > 1. Such tables were computed long ago and 
are available in many places. 


Example. Evaluate (37) for < » 3 and check the result with the value of 

0(0 iiiot t ^ Z given in the tables in the Appendix. 

Solution. Since y - t/y/2 we are to evaluate (42) for y » 3/\/5. Substitut¬ 
ing this value in (42) we have 




e-*'* 13 15 105 I 

2^/Z 3 I 9'''81 729 6661/ 

-.8-|*(3)-(.9213) 

- .6 - .00136 = .49864. 


The value given in the tables is .49865. 
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6. The Gram-Charlier Series. If a function /(x) gives only a 
rough approximation to a frequency distribution, a more accurate rep¬ 
resentation may be obtained by using the first few terms of the series 

(43) F{X) = Cofix) + C,/<«(X) + + . . . + + . .. 

where fix), called the “ generating function,gives a first approxi¬ 
mation to the given distribution, and is the nth derivative 

of fix) with respect to x. 

It should be observed that series representation is also involved in 
the Pearson system. For, suppose the differential equation imder- 
lying that system is written in the form 

% _ yja - x) ^ 
dx fix) 

Then if it be assumed that fix) is expressible as a power series which 
is so rapidly convergent that the first few terms are sufficient, we have 
the form given in (12). In the Pearson system the series occurs in 
the differential equation of the function whereas in the Gram-Charlier 
system it occurs in the function itself. 

If in (43) the normal curve is taken as the generating function 
then Fix) is known as the Gram-Charlier Type A series. In dis¬ 
cussing this series no essential loss of generality is suffered by using 
standard units. Thus we may write 

(44) Fit) = Co0(O + Ci<t>^^Kt) + + . . . + + • • • 

where 0(0 is defined in (21). The moments of Fit) are defined by 

(45) an = r Fit)t- dt 

t/ — 00 

and it follows that ao = 1, ai = 0, a 2 = 1. 

The coefficients Cn in (44) may be expressed in terms of the moments 
an, because the functions (t>^^^it) and the Hermite pol 3 momials i?m(0 
defined by the relation 

(46) 0C-)(O = (~l)-ffn.(0<^(0 
form a biorthogonal system. That is 

(47) f 0(»»)(OHm(O dt ^0 for m n, 

(48) r df = (-l)"n ! for m = n. 
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Proofs of (47) and (48) are available in the literature * and will be 
omitted here. The recursion relation 

(49) fr^i(0 = tHnit) - nffn.i(0 

can be established.! By differentiating we find from (46) that 
Hi = t and since Hq — Iwe can use (49) for n ^ 1. 

To make use of the biorthogonal property noted above we multiply 
both members of (44) by Hn(t) and integrating, under the assumption 
that the series is uniformly convergent, we obtain 

(50) r F(t)Hn{t) dt = Cn r dt = Cn(~l)«n 1 

t/ —00 t/ —00 

since all terms of the right member vanish except the one with the 
coefficient Cn. Hence from (50) we have 

(51) c« = rF{t)H„{t) dt. 

From (61), (49), aaid (45) we obtain the following results: 


Co = /* Fif) di = 1 
«/ —00 



F(«)(f* - 1) = 0 

C-f,/_>)(-<■ +30 .B.-f, 

C« = + 3) dt = . 

We have, therefore, 

(62) F(t) = ^(0 - + ^^+^‘'(0 + • • • 

and F(x) = ^F(i), 

The values of <^(0, of its integral, and of its second to eighth deriva¬ 
tive, are given to five places of decimals in OhveFs Tables. 

* See Rietz, Mathematical StatisticSf pp. 165-168. 
t See Levy and Roth, Elements of Probability. Oxford. 1936. 
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Exercises 

1. Prove that the points of inflection of the normal curve are equidistant from 

the mode. What are the codrdinates of these points? 

2. If X has the distribution function y = /(x), with total frequency 1, the mean 

deviation, M, about the value v is defined by 


M = 




\x — V |/(x) dx. 


Prove that M is a minimum when v is the median, that is, when the ordi¬ 
nate at X = V bisects the area under y = fix), 

Solitiion, We may write the expression for M in the form 


M 


= 1 (v — x)/(x) dx + j (x — v)fix) dx, 

t/ — 00 t/p 


It is shown in treatises on advanced calculus that if 
= J J(x, e) dx, 

6 being a parameter and a and b being functions of $, then 
dH r’’ df da db 

Therefore, differentiating M with respect to v and equating the result to 
zero, we have 


/ fix) dx — j fix) dx = 0. 

- 00 t/p 


So M is a minimum when 


/ fix) dx — j fix) dXf that is, when the 

- 00 ^/v 

partial areas to the right and left of v are equal. (It is left to the student 
to show that M is actually a minimum when dM/dv = 0.) 

3. Prove that the relation between the mean deviation (about the mean) and 
the standard deviation of the normal curve (in arbitrary units) is 

M = (2/7r)i/V = .798<r, approximately. 

Hint, By definition, 

I y\x—2\dx = (rf <f>it) \ t\dt = 2 (t I <l>it)t dt, 

4. Suppose X is distributed in accord with the frequency curve y = 

0 X ^ 00 , a being a positive constant and C being determined by the 
condition that the area under the curve is N. Evaluate n successively 
for A; = 1, 2, 3, 4. Then find for A; = 2, 3, 4, and finally obtain the 
values X = a, <r = o, as = 2, a 4 = 9. 

5. Given fix) = (7x”“^6“*, 0 x ^ oo, where C is determined by the condition 
that the area under the curve is unity. Evaluate for A; ~ 1 to 4, nk for 
A: = 2 to 4, and a* for A; = 3, 4. Show that and satisfy the criterion 

2of4 — Sotf^ ““ 6 0. 
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6. State the ditferential equation underlying the Pearson system of frequency 

curves and derive the equation of the normal curve as a special solution 
of this equation. Evaluate the constant of integration so that the area 
under the curve is unity. 

7. Discuss the Type III curve. 

8 . Show that y in ( 22 ) vanishes when t «= —^ and i ^ to. 

9. Bead Chapter III of the Carus Monograph on Mathemaiical Statietks by 

H. L. Rietz. 

10. Explain how the probability integral (37) may be evaluated for, (a) small 

values of ( 6 ) large values of L 

11. Evaluate (37) for (a) t = V2/2, (b) t = 2y/2, 

12 . Consult the reference cited for the proofs of (47) and (48) and give a report 

on them. 

13. By successive differentiation of 4>{i) evaluate Hmit) from (46) for w ** 1 , 2 , 

3, 4. Check your results with (49) for n = 1 , 2 , 3. 

14. Making use of the biorthogonal property of Hermite polynomials and deriva¬ 

tives of the normal curve, derive the values of Cn, n = 0 to 4, in the Type A 
series. 

16. Taking t = 0, =fcl, ± 2 , ±3, plot (52) on the same axes when (a) aa = 0 and 
04 — 3, ( 6 ) os = — 1.2 and 04 == 3, (c) oj = — 1.2 and 04 = 4.2. In ( 6 ) if 
Of « 1 . 2 , what effect would this have on the curve? 



CHAPTER IV 


JOINT DISTRIBUTIONS OF TWO VARIABLES. THE NORMAL 
CORRELATION SURFACE 

1. Fundamental Notions. Definitions of a frequency function of 
one variable and the associated notion of probability were given in 
Chapter III. Corresponding definitions will now be given for an 
arbitrary probability distribution of two variables. The continuous 
variables (a;, y) have the joint probability function f{x, y) if the 
double integral of fix, y) over a region of the {x, 2 /)-plane measures 
the relative frequency of occurrence of pairs of values {x, y) in that 
region. It will be understood that fix, y) is continuous, single- 
valued, and non-negative. If values of (x, y) arc restricted to a 
finite region we define fix, y) to be identically zero outside that re¬ 
gion. In the extended region of definition, we have 

(1) r r Si^^y) dy dx ^ 

Geometrically, this means that the volume under the surface rep¬ 
resented by z = fix, y) is unity. Then/(a:, y) dy dx is the probability 
that simultaneously x lies in the interval (x, x + dx) and y lies in 
the interval iy, y + dy). Consequently, 

ph pd 

(2) / / /(x, y) dy dx 

represents the probability that x lies between a and h at the same 
time that y lies between c and d. 

We shall distinguish between two cases: (a) when the variables 
are independent in the probability sense, and (6) when they are 
correlated. Let the probability be gix) dx that x occurs in dx for 
all y^Q. Then integrating over all admissible values of y, we have 

/ oo 

/(x, y) dy. 

- 00 

It is clear that the integral in (3) gives g{x) because the relative 
frequency of occurrence of a: in any interval (a, 6) is the relative 
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frequency of pairs {x, y) belonging to the strip of the xy-plane for 
which a < X <b, and this is 


r/: 


/(x, y) dy dx 



dx. 


Similarly, if h{y) dy is the probability that y occurs in dy for all 
assignments of x, we have 

^ 00 

(4) h{y) dy = dy i fix, y) dx. 

t/ — 00 


In accordance with convention we shall call gix) and hiy) the marginal 
distributions. 

The independence of x and y is characterized by the following 

Definition. The variables x and y are independent when fix, y) 
= gix)hiy). If fix, y) cannot he expressed identically as the product 
of the marginal distributions, then x and y are said to be correlated. 

2. Moments. Let the general product moment about the com¬ 
mon origin of x and y be defined as follows: 


(5) 


^mn 



fix, dy dx. 


If m = 0 and n = 1, we have 

/ OO 00 

/ fi^, y)y dy dx. 

- 00 t/ — 00 


Let /(x, y) be a function in which the order of integration may be 
interchanged. Then vqi becomes 



fix, y) dx 


n 00 


Ky)y dy. 


which is the mean, y, of the y’s. Similarly, the mean of the x’s is 


(7) 


J»10 


X - r r 


fix, y)x dy dx 




gix)x dx. 


We will now define the general product moment about the means 
(J, y) as follows: 



Joint Distributions of Two Variables 


65 


When m = n = 1, we have 

(9) Mu = f f (x - f)(y - y)f(x, y) dy dx, 

t/ — CO «/ — 00 


which is styled the co-variance of the joint distribution. 
When m = 2 and n = 0, we have the variance of a:, 


( 10 ) 


/ oo 00 

/ (x - xYSisc, y) dy dx 

- CO c/ — 00 
^ 00 

= 1 (a; — xYgix) dx 

«/ — 00 


Similarly, when to = 0 and n = 2, we have the variance of y, 
(11) M 02 = f f (y - yYf(x, y) dy dx 

t/ — 00 a/ — 00 
00 

= / (y - yYhiy) dy 

a/ — 00 


= 


It is left as an exercise for the student to show that 


( 12 ) 


MU = J'll — VioVqu 
M20 == J'20 — 


The coefScient of correlation between x and y, denoted by p*y, 
is defined by 


( 13 ) 


9xy 


1^11 


3. Regression. If y has been assigned in the joint probability 
function /(x, y)j the probability that x will lie in an infinitesimal 
interval is 


Kx, y) 

Hy) 


dx. 


Thus, when y is fixed. 



/(jg, y) 
h{y) 


dx = 


1 , 


and so /(x, y)/h{y) is the probability function of x for a fixed y. 
It may be called the probability density representing a y array of x’s. 
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likewise, if we fix a: the probability density for an x array of y’s 
is given by/(a:, y)/g(x), since 

when X is fixed. 

The notion of arrays may be made more concrete by thinking 
of a joint distribution of the heights and weights of men. If x refers 
to weight and y to height, then an example of an x array of is the 
distribution of the heights of all men who weigh 150 pounds, and the 
weights of all men who are six feet tall is an example of a array of a;^s. 
The mean of an x array of i/^s is 

where the integration is performed over all values in the array 
defined by x. Similarly, the mean of a 2 / array of x^s is 

integrated over all x’s in an array for a fixed y. 

The variance in an x array of y^s is given by 


integrated over all values in the array fixed by x. Similarly, the 



variance in a y array of x^s is 

( 17 ) 

integrated over all values in the array 
fixed by y. 

Taking different x arrays of 2 /^s 
fixes the mean points y* and as x 
varies continuously we get the locus 
of these means which is called the 
regression curve of y on x. Its equa¬ 
tion is given by (14) where now, of 
course, x is a variable. Similarly, 
(15) gives the regression curve of x 


on y. Of particular interest and use are the cases in which these 
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regression curves are straight lines. If the equation of the regres¬ 
sion curve of y on a; is of the form 


= ila; + B, 

then the regression of y on x is said to be linear. Similarly, if the 
equation of the regression curve of x on i/ is of the form 


Xy = Cy + D, 

then the regression of x on ^ is said to be linear. If one regression 
system is linear the other is not necessarily linear. 

Let us now consider the implications of linear regression on the 
joint probability function /(x, y) and the marginal totals g{x) and 
h{y). Consider 

•/-» gix) 
or 

(18) J yfix, y) dy = Axg(x) + Bg(x). 


Integrating each side of (18) with respect to x, and remembering 
that we may interchange the order of integration, we have 


r°‘V r°° "1 r“ r" 

/ / yfi-x:, y) dy \dx = AJ xg(x) dx + BJ g(x) dx, 

V QO L.^ — 09 J — 09 ^ — 00 


or 


(19) 


1^01 = Avio + B. 


Multiplying each side of (18) by x and integrating with respect to x, 
we have 


I'Ai: yf(x, |/)dyjda: = a: X* g{x) dx + bJ* xg{x) dx. 


Since the left member is 



xyfix, y) dy 


dx 




we have 

( 20 ) 


Vii = A V2Q + Bvio, 



68 


Mathematics of Statistics 


A simultaneous solution of (19) and (20) sdelds 


vii — J'loi'oi Mil 

A =-- = —= p — , 

J'20 — J'lO^ M20 

B — voi — viQ — p 
<Tx 


- X —p. 


Therefore the equation of the line of regression of y on x becomes 

(21) Px 


-y = pJ(JC-I). 


In an analogous manner, if the regression of x on y is linear the 
regression line has the equation 


( 22 ) 


Tt = 


p^(tf-p). 


The quantities A = p(<ry/(fx) and C = p{(rx/<ry) are called 
regression coefficients. It is obvious that their product is p^. 

Example, Given 


the 



fix, y) = ■ 


0 ^ X ^ y, 

0 ^ 2/ ^ a, 

as the joint probability function of two 
variables x and y. Find (i) the margi¬ 
nal totals g{x) and hiy)] (ii) the mean 
and variance of each of the marginal 
totals, i.e., v\Q and cr** = jii2o for gix), poi 
and (Ty* = M02 for hiy); (iii) the equa¬ 
tions of the regression curves of y on x 
and of X on y, y» and (iv) the 
correlation coefficient p. 

SoliUions. The volume under the 


surface represented by the given function is unity. Thus 

n v 2 2 r® 


The surfaoe is shown above. 

(i) The marginal totals are 

g(x) 


J r 2 

X 


- — (o - *) 
o* 


J r*y 2 2y 

0 o* o’ 
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(ii) The means are 


VlO = X 


^01 


Since 


J r® 2 c 

x~-(a - x) dz =«; 
0 a* 5 

- r® 2y 2a 

-y-Jo 

J r® 2 a* 

-- (a — x) (ix = ~ 
0 a* 6 

J r^* o2 
0 ^ a* 


the variances are 


,2y^y^2 


„ O* fl* 

a* 4 a* a* 


(iii) The regression lines are 

-r 


-X' 


2/a* a + X 

^2(a ~ x)/a* ^ 2 

2/a* ^ y 

X ——- dx == - • 

2^/a* 2 


(iv) From the equations of the regression lines it follows that p* = i and 
p = J since p(<ry/o-*) is positive. 


4. The Standard Error of Estimate. We have seen that the 
probability density in an x array of y^s is f{x, y)/g(x). Then the 
variance within such an array is 


.... =/_; 




The mean, over all x arrays, of values of Sy.*^ weighted with the 
marginal distribution of x is denoted by Sy^, and Sy is called the 
standard error of estimate. We will now show that Sy^ = cry^(l — p*). 
By definition, 
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Using the value of given in (21) the above expression becomes 

= /* f \y - y - p — (x - x) f(x,y)dydz 
= /* /* 1(1/ - 5)* - 2p—(j/ - g)(a: - S) + 

•/-oo^-ool O'* 

p*^(x - f)* f(x, y) dydx, 

O'* 

and the right member simplifies so that we have the result 

= <r,Hl - P*). 

From this result it follows that 

-1 ^ P ^ 1. 

6. The Normal Correlation Surface. We shall now consider a 
joint probability function of special interest. The normal correla¬ 
tion surface is defined by the following function 

(23) fix,y)^Ke-^, 

where 

j, 1 2pxy y* 1 

i = 2ir<r;^^(l - p!*)>/*, 

— oor^jc^oo, —oo^y^oo, 


and the variables x and y have the origin of their reference system at their 
respective means, that is, 



These conditions (24) may be imposed without essential loss of 
generality and will simplify the algebraic discussion. 



Joint Distributions of Two Variables 


71 


The marginal distribution of a: is given by 
9{x) = J* Six, y) dy 

= dy' 

= Ke-^i^’x^{2iti\ - p*)}»Vv. 
(25) -^e-**'****. 

0-aV27r 

Similarly, the marginal distribution of y is 


(26) 


Kv) =J fix, y) dx 

== - L_ g-y2/2<,j^2^ 

<ryV27r 


Hence we may state 

Theorem I. If two variables are normally correlated^ each variable 
is normally distributed in its marginal totals. 

That the converse is not necessarily true is shown by the following 
illustration. Consider a clay model of a normal 
correlation surface such that its marginal totals 
are necessarily normal distributions by the above 
theorem. Quantities of the clay can be redis¬ 
tributed by piling up in certain spots the clay 
that is scooped out in other spots in such a way 
that the marginal totals are not disturbed. It 
is obvious that the resulting surface is not one that is defined 
by (23). 

Other interesting properties of normally correlated variables are 
described by the following theorems. 

Theorem II, The regression systems of a normal correlation surface 
are linear. 

The proof is a matter of integration. Let us find the probabil¬ 
ity function of an x array of y^8. By definition, this is given by 
/(x, y)/g{x). To get the mean of such an array we must multiply 
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its probability distribution by y and integrate over all values of 
y in the array. Thus we have 



yf(,x, y) dy 
g(x) 

—- dy 


XpCTy 

Cx 


In the exercises at the end of the chapter the student is asked to 
verify the above result. If x is allowed to vary over the arrays, it is 
evident that the locus of the means of the x arrays of 2 /^s is the line 


(27) 


Xp(Ty 



In a similar way the mean of a 2 / array of x'b is given by 


Ity 



xfjxy y) dx 
Ky) 


cry 


and this lies on the regression line 


(28) 


Xy 


yp<rx 

ay 


While it is an intrinsic property of a normal correlation surface 
that both regressions are linear, one should not infer that this is 
characteristic of joint probability functions in general. One or both 
or neither of the regression systems of an arbitrary distribution 
function may be linear. The student will observe that the definition 
of the correlation coeflScient did not involve the condition that 
/(x, y) was normal nor that regression was linear. Although the 
definition of a correlation coefficient does not require linear regression, 
nevertheless the correlation coefficient may fail to measure the 
correlation in the case of appreciable non-linear regression. 

Theorem HI. If x and y are normally correlated, then each array is 
a normal distribution with constant variance Sy^ from one array of y^s 
to another and constani variance S** from om array of x^s to another. 
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The proof consists in exhibiting the distribution function for an 
X array of j/’s and for a y array of x’s. Thus, for the first case we have 


(29) 


/(X, V) _ ]_ 

g(,x) V2irSy 


ly—n* 


where = (ry*(l — p^). Evidently, this is a normal distribution 
with variance Sy^ which is independent of x and therefore is constant 
over all x arrays. It is left as an exercise for the student to give the 
companion proof for the arrays in the y direction. 

When the variance is constant over the arrays in the x direction 
the regression system of 2 / on rc is said to be homoscedastic (equally 
scattered). Similarly for the y direction. A geometrical represen¬ 
tation of a normal correlation surface is given in Part I, § 18 of 
Chapter VIII. 

6. Limiting Forms. Suppose a plane is passed through the surface 
defined by (23) parallel to the a;y-plane. Analytically, this means 
that we leifixy y) — c where c is some constant less than the maximum 
value of the function, that is, we take 0 < c < jfiT to insure a real 
intersection. We obtain 


(30) 

where 


x^ 2pxy y 


O’xO'y 




(30a) 


X* = 2(1 - p») log.- 
c 


which is obviously not negative. Thus the points (x, y) for which 
the probability density is constant lie on an ellipse. 

It is easier to study (30) if we transform the variables to standard 
units by letting tx = x/or* and ty = y/ay. Then (30) becomes 

(31) tx^ - 2pWy + ty^ = X2. 

The cross-product term will vanish under the transformations 

— w cos 0 — y sin d 
ty - u sin 6 + V cos B 

when 6 = 7 r/ 4 . So the required rotation formulas are 


(32) 


t. - 


u -- V 
( 2)^/2 


— ^ ^ 
” (2)1/^ ’ 


and ti 
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Applying these to (31) we obtain 

(33) wHl - p) + v%l + p) = X* 

which may be written in the standard form 


(34) 

where 




1 
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X2 

1+p" 


The eccentricity of the ellipse (34) is (1 — = [2p/(l + p)]^^*. 

We see that 6 a as p 0. When p = 0, 6 = a = X. Then (34) 
would be a circle, and (23) would be a surface of revolution if the 
variables were expressed in standard units. When p = 1, it follows 
from (33) and (30a) that v = 0. From (32) it is seen that the line 
V = 0 is the same as and the ellipse has degenerated into a 

straight line. The surface then shrinks into a normal curve in the 
plane ty = 4. 

7. Tetrachoric Correlation. The word tetrachoric refers to a 
2X2 fold table. Suppose N objects are classified according as 
they possess one or both or neither of two qualitative traits or attri¬ 
butes which may, for convenience, be denoted by I and II. Such 
a classification will yield a four fold table as shown in Table 4, 


Table 4 



Not II 

II 

Total 

Not I 

a 

b 

0 + 6 

I 

c 

d 

c + d 

Total 

a -f c 

b d 

N 


where a + b + c + d = Ny the four classes being mutually exclusive 
but not necessarily exhaustive. The attributes may sometimes 
admit also of quantitative measurement but we are considering only 
the case where they are classified in dichotomy, such as ** tall and 
‘‘ not tall,^’ male and “ female,*' alive ” and dead," good " 
and ‘‘ bad," ** dull" and ** not dull," etc. An example is the follow- 
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ing classification of 26,287 children where attribute I is dullness and 
attribute II is developmental defects. 

Table 5. (K. Pearson, Tables, p. li) 



Wiihoui 

Defects 

WUk 

Defects 

Totals 

Not Dull 

22,793 

1,140 

24,213 

Dull 

1,186 

888 

2,074 

Totals 

23,979 

2,308 

26,287 


The problem in such classifications is to measure the intensity 
of association between the two attributes in the set. Let us suppose 
that our data had been given initially so that a fine division into 
many cells was possible and that the result would have presented 
a normal correlation surface. If this surface were then divided into 
four cells by planes x = h and y = k to yield the relative frequencies 
observed, then the correlation coefficient that characterizes this 
normal correlation surface is called tetrachoric r. It will be denoted 
by rt, 

K. Pearson has given a method and tables for determining r<. 
(Cf. Tables for Statisticians and Biometricians, Part I.) The pro¬ 
cedure may be indicated by the following diagram and skeleton 
solution for our example, Table 5. (The details will be found in the 
reference cited.) 

Solution of Example, (See Figure 11, page 76.) 

X h 9074 

= = . 078,898, A = 1.413; 

. 20287 

X‘.-S 

Entering Pearson’s Tables for the above values of h and k and interpolating, it is 
found that rt = .652. 

The determination of rt by Pearson’s method is rather tedious when .2 ^ |r<| 
^ .8. This burden has been lifted by two fairly recent publications. Camp has 
given in his text (pp. 307-310) an ingenious and simple method for approximat¬ 
ing r(. His scheme is interesting from the mathematical as well as the practical 
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point of view. In ComptUing Diagrams for the Tetrachoric Correlation Coejjicient 
by Thnrstone et al. (available at the University of Chicago Bookstore), a useful 
approximation to can be determined by inspection. 



Illl=/^(t)dt, ^=/i(t)dt' 

— 00 —00 

Fig. 11 

Exercises 

1* Show that the definition of p may be written in the form 
P *- / / xyf{x^ y) dy dx — Sy. 

C‘g<f y 001/— 00 

2. Given that /(x, y) = 2/a*, O^x^y, O^y^ a. Show that both regres¬ 
sion systems are linear. Evaluate p. 

8. Derive (22). 

4. Prove that the area of the ellipse (30) is irX*v*<ry/(l — p*)^'*. 

5. (a) If p « .6 show that the ratio between the major and minor axes of 

the ellipse is 2. 

(&) Show that the slope of the regression line of y on x for a normal cor¬ 
relation surface is p/(l — p*)^^* in units of Sy and v,. 
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6. Establish the truth or falsity of the following proposition: A necessary and 

sufficient condition that two variables be normally correlated is that their 
regression systems be linear. 

7. Prove that the regression systems of two normally correlated variables are 

linear and homoscedastic. 


8 . For (23) prove the following: 

(а) the mean value of taken over all values of x is zero, 

(б) the variance of y* is equal to 

(c) the correlation coefficient between 5z and y is equal to p. 


Hints, 


{a) Evaluate r./- 
(b) 


Vxfix, y) dy dx. 


C. 2 
Vz 


/ oo /* ® 
- OOt/ — c 


5.7(*. y) dy dx, 


(c) Evaluate I f — fix, y) dy dx. 

1/ — 00^/ — 00 X 


9. If X and y are discrete variables, p is defined by 


E(xy) - E(x)E{y) 

p ---, 

(TxCry 

where 

= [E(x>) - |E(x)t’]‘'=, a, = [Eiy‘) -[{Eiy)]*]^*, ' 

and 

n 

Eix) = Y,^igixi), 

1 

Eiy) = Y.yMyi), 

1 

n m 

Eixy) = 'Z.'UxiVifixi, yd, 

1 1 

Vi) being the probability for the simultaneous occurrence of the pair 
m n 

of values (x<, j/y), g{xi) = 2^/(x», j/,) and hiyi) == Ylf(xi, yi) being the 

marginal distributions of x and y, respectively. Find p for the table in 
Exercise 8, § 13, Chapter I. 

10. Investigate the references given for tetrachoric r and give a report on the 
results of your study. 



CHAPTER V 

MULTIPLE AND PARTIAL CORRELATION 

1. Notation. Simple correlation theory deals with co-variation 
in two variables. If other factors are involved the two variables 
are assessed as the important ones for the investigation and the 
other factors are ignored. But situations frequently arise in the 
fields of agriculture, biology, economics, education, and psychology, 
which call for consideration of three or more influences bearing 
simultaneously on a problem, and hence for the investigation of 
interrelations among three or more variables. For example, crop 
yield varies with soil fertility, rainfall, and temperature; wheat 
production is affected by acreage planted and yield per acre; stu¬ 
dents^ honor points are connected with intelligence, health, hours 
of study, etc.; their chest measurements vary with stature and 
weight. 

J The term multiple correlation refers to a theory of correlation 
involving three or more variables."' For ease in exposition we shall 
restrict the derivation of formulas to the three-variable case although 



the method is perfectly general. When the three-variable case is 
understood the formulas can be generalized for k variables. 

The framework of a two-way table was a rectangle in the a;i/-plane 
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which was divided into cells by lines parallel to the axes. The 
analogue in the case of three variables, which we shall denote by 
Xy y, and 2 , is a rectangular parallelopiped divided into cells by slicing 
planes parallel to the axes. 

^ We shall denote the frequency in the cell whose mid-point has the 
coordinates {Xy y, z) by f(xy y, 2 ). A pair of (x, y) values fixes a 
z column (Figure 12), and the sum of the frequencies in such a 
column is the “ column total 

j fi) y< 2) = /(*. y)> 

^ z 

where here and subsequently the symbol 53 together with the 
variable underneath denotes a summation in the direction of that 
variable. Now consider all those columns which have the same 
y. Their total frequency, denoted by 

(2) y) = S(y), 

X 

may appropriately be called a “ slab total ” (Figure 13). 



Finally, if we add all the slab totals we get the total frequency AT. 
Thus 


£/(») = W’- 


(3) 
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By making use of (1) we may, if we wish, express (2) as the double sum 

(4) V, i) = S{y), 

X z 

and using (4) we may express (3) as the triple sum 

(5) y, z) = N. 

X V z 

(a) The aggregate of the column totals /(x, y) forms a two-way 
frequency table. If we imagine the numerical values of these fre¬ 
quencies written in the cells of the rrzz-plane it is easy to see that 
they constitute a correlation table (Figure 14). For this table, the 
simple correlation coefficient r^y is called the total correlation (in 
contradistinction to a partial correlation coefficient to be defined 
later) and the regression curves are called the total regressions of 
y on X and x on y. Discussions analogous to (a) will now be given 
for horizontal columns parallel (6) to Ox and (c) to Oy. 



(b) A pair of (y, z) values fixes an x column parallel to Ox. The 
sum of the frequencies in an a; column is 

(6) S/(*. y, z) = f(y, z). 

X 

If we add all those columns which have the same z we get a slab 
perpendicular to z whose total is 

(7) S/(y, z) = /(z). 

V 

Finally, the totals of all such slabs is 

(8) Z/Cz) - N. 
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The numerical values of the totals f{y, z) written, if desired, in the 
cells of y 2 ;-plane form a two-way correlation table, as represented in 
Figure 15. For this table, is the total correlation coefficient 


between y and and the regression 
curves are the total regressions of y on 
z and z on y. 

(c) Similarly, a pair of (x, z) values 
fixes a y column parallel to Oy. The 
sum of the frequencies in such a col- 
unm is 

(9) y, z) = fix, z). 

y 

If we add all the columns which have 
the same x we get a slab perpendicular 
to X whose total is 

(10) Hsix, z) = Six). 

z 

The sum of all such slabs is 

(11) HSix) = N. 

X 



The numerical values of the column totals /(x, z) constitute a two- 
way correlation table whose correlation coefficient r** is the total 



Fig. 16 


correlation between x and z. The total regressions of a; on 2 and zonx 
are given by the regression curves of this table (Figure 16). 
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2. Regression. The mean of a column at (x, y) is defined by 
(12) 5(3:, y) = y, z). 

j\^) y) « 

Similarly, the mean of an x column at (v, z) is 


(13) 


»(j/, 2) = y, 2)> 


Si.y> 2) 

and the mean of a j/ column at (x, z) is 

1 


(14) 


y{x, z) = 


IjySix, y, z). 


fix, z) y 

The regression plane of z on xy is that plane which fits the means 
of the z columns best in a least-squares sense. This should not be 
confused with the -true regression surface, z on xy^ which is defined 
as the locus of the mean points of the z columns. More accurately, 
it is the locus of these points as the dimensions of the cells approach 
zero. The regression plane, z on xy, is that plane which fits best the 
true regression surface, z on xy. Corresponding statements hold for 
the regression planes of y on xz and of x on yz. 

So far, it was convenient to designate our variables by the con¬ 
ventional letters used in representing three-dimensional space. We 
are now about to obtain the equations of the regression planes and 
in order to extend our results to k variables 
it will be desirable to change to a new set 
of symbols which will lend themselves more 
readily to generalization. The switch will 
cause no difficulty. We shall now use Xi in 
place of z, X 2 in place of x, and xz in place 
of y. The relations between the r's in the 
old notation and the new are == r 28 , 
ry» = ris, rxt = r^. The adjacent diagram 
will help us keep in mind the relations between the new symbols 
and the old. 

We shall now derive the equation of the regression plane of Xi 
on X 2 and xz. In determining, under a least-squares criterion, the 
parameters in its equation it will simplify the exposition if we assume 
that the variables are measured from their respective means as origin. 
This may be assumed without loss of generality. Let the desired 
equation be of the form 

(15) Xi = Ax2 “h Bxz ”h C. 
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Then we may determine the parameters in (15) so that the sum of 
the squares of the residuals 

(16) U = I^(a:i - Axi - Bxs - Cyf 

1 . 2,3 

is a minimum, f being short for f{xi, X 2 , Xf), and 53 for 21323- 

1 , 2,3 I, X, X, 

Equating to zero the first partial derivatives of U with respect to 
A, B, and C, we obtain the equations 

532 : 2 ( 3:1 — Axi — Bxs — C)f = 0, 

^X3(xi — Ax2 — Bxz — C)f = 0, 

C = 0. 

The simplification of the last equation is a consequence of our choice 
of origin since 533=1/ = ^1^4 = = 0 when the origin of x,- 

is at the mean of its N values. The first two equations may be 
written in the form 

, . I A'^xii + B'^xzxzf = 533 = 1352 /, 

1 A'^Xix 4 + Bj^xz^f = 233 = 13 = 3 /. 


Let be the variance of x< and let r j,- be the correlation coeflScient 
between Xi and x,-. Then by definition, 

Xx<V(xi, Xs, Xz) = 

^Xix4(xi, Xz, Xz) = Nffiff/Tii. 


So (17) becomes 
(18) 


j NAa^ ” 1 ” NBtfzVzTiz — NviczTiz, 

1 NAffzffz'^ii “h NBffz^ — N(fi<fzTiZ‘ 


Solving for A and B we have 




<72 


ri2 

ns 

1 

r28 


r23 

r23 

1 


B = 


<^3 


1 

r28 

1 

r28 


ri2 

rn 

r23 


t 
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It is convenient both for simplicity and for the purpose of general¬ 
izing to k variables to define the determinant R by 


R = 


ni 

rn 

rai 


ri2 

riz 

T 22 

TZZ 

rz2 

Tzz 


and to let Ra be the cofactor of that is, the minor of including 
the sign factor (—1)*+^. Thus, 


Ri2 = — 


r2i 

rzi 


r23 

r33 


22^8 = 


r2i 

r22 

rzi 

rz2 


Clearly, rn = r 22 = = 1, and ri 2 = r 2 i, etc., so the expressions 

for A and B may be written 

^ , 
cr2Rii 


<f\Riz 

CzRll 


Hence (15) becomes 

(19) - Rn + - Rn + - Rn = 0. 

^ <ri or2 CTs 


This equation gives the most probable value of xi for assigned values 
of X 2 and Xa, provided that the true regression is not far from being 
linear and the distribution of each Zi column is nearly symmetrical 
so that its mode is close to its mean. It is an important equation 
because it shows how, on the average, changes in xz and xz affect xi. 
The student will observe that the R*& involve only simple correlation 
coefficienjbs and that all the necessary computations for the terms in 
(19) were explained in Part I. 

There are two analogous equations for the regression planes of Xz 
on XiXzi and Xz on XiXZf which can be obtained readily from (19) by a 
cyclical permutation of the subscripts on x and fi. They are 
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when xa is the dependent variable, and 


(21) -R^^+-R^l + -R,, = 0 

CTZ <7l <T2 


when Xb is the dependent variable. Referred to an arbitrary origin 
(19) would have been 


(19«) 


Xi 


■fill 


ATa - JPa 

0-2 


Xi - Z 


Riz = 0, 


where = Xi. Analogous adjustments of (20) and (21) are 

obvious when the variables are referred to an arbitrary origin. 

The three-dimensional case can now be generalized. By methods 
similar to those employed in deriving (15) we can derive the linear 
regression equation for k variables. Thus we have the hyperplane xi 
on X 2 i Xa, * • *, Xky 


(22) — -|— J?i2 -f- • • • -j- Rik = 0, 

(Ti 0*2 (Jk 


where is the cofactor of ra in 




. . . 

. . rik 

(23) 

R = 

. . T22 • 

. . . . 



Tkl ... 

• • f'kk 


When expressed in standard units, (22) becomes 
(22a) <1 = - jlRuU, 

till i=2 

where U = Then ti may be regarded as a weighted mean of 

the contributions of the other variables. The factor Ru represents 
the force or weight of U when all these variables are given an oppor¬ 
tunity to predict the value of h, 

3. Standard Error of Estimate. In Part I (Chapter VIII) we 
learned that = <ry\\ — r^) was a measure of the closeness with 
which the means of the x arrays of y clustered about the line of re¬ 
gression of y on X. Sy was called the standard error of estimate and 
the larger r was, the smaller was Sy, We now seek an analogous 
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expression for the three-variable case. To this end let 
(24) /Si.23^ = ~ ^2, Xz) 

^ 1.2,3 


where 5 is the distance, measured parallel to the xi-axis, between the 
regression plane and the points (xi, xz) Xz)^ 23 denoting a summation 

1 , 2,3 

over all these points. That is, 5 = (observed Xi — estimated ai), 
the estimated Xi being given by ( 19 ). Then we may write 

^ Z (fill- + fii2- + fils-) / 

Rn^ \ (Ti <T2 <rz/ 

= (fiii^ + Bi2^ + Riz^ + 2RiiRi2ri2 
ivir 

+ 2RnRizriz + 2Ri2RizT22) 

(Ti^N 

= - {Rii{Rii + ri 2 Ri 2 + TizRn) + Ri 2 {Ri 2 + ruRn + ^23^13) 

tfcir 

+ Riz{Riz + TizRn + ^23^12)}, 


According to Laplace^s development of a determinant, the elements 
of any row (or column) and their corresponding cofactors may be 
used to develop R. If, in the resulting expression, the elements 
of this row (or column) are replaced by the corresponding elements 
of some other row (or column) the expression vanishes. Therefore, 
we have 


(25) 


Rn + T12R12 + rizRiz = 
Ri2 + ri2i?ii + T2zRi8 = 

Ru + TizRll + r 2 zRi 2 = 


R, 

(a) 

0 , 

(6) 

0 . 

(c) 


Using (25) in the above derivation we obtain 


(26) 


<Sl.23* 


aim 

Rii 


This is a kind of average variance in Xi columns of the observed values 
of xi from its corresponding estimated values on the regression 
plane (19). The square root of (26), 


(26a) 



is called the standard error of estimating xi from assigned values of 
X 2 and xz. 
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4. Standard Deviation of Estimated Values. Next, we shall 
obtain an expression analogous to asy of Part I (§ 7, Chapter VIII) 
for the standard deviation of the estimated values given by (19). 
The mean value of these estimates is zero since a:< is measured from 
its mean as origin. Therefore, the variance, <rsi*, of the estimated 
values of Xi is given by 


(27) 


ft 1 Ri2 . 

'if 


Rn^ 


{Ru^ + Rn^ + 2Ri2RnT2i) 


= ISTT {Rvi(Ri2 + Rizr2i) + RiziRiz + RisT22)\ 
rCic 

- ^ { -Ri 2 Ruri 2 - RizRnTn] by (6) and (c) of (25) 


= ^*(Bu-B)by (a)of (25) 

iin 



Hence we have 



If this result is to correspond to (Tev = we would expect that 
the factor (1 — R/RnY^^ would correspond in some way with r. 
This is indeed the case and we shall now show that this factor is the 
formula for the multiple correlation coefficient of Xi on X 2 and Xz. 

6 . Multiple Correlation Coefficient. The ordinary correlation co¬ 
efficient between the observed values of Xi and its corresponding 
estimated values calculated from (19) is called the multiple correla¬ 
tion coefficient of Xi on X 2 and Xz. It is denoted by ri. 23 , so we have 


ri.23 = 


^0X1 sXi ^ 


where oXi and sXi denote the observed and estimated values, respec¬ 
tively, of Xi. 
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Udng (19) this may be written in the form 

RiiXt RuXt\ 

NffiffairiM - ~V~'S- d - 1 

0*1 \ Jtn 0*2 rill o’a/ 

(—R 12 T 12 — 

iRn-R) 


N<n\ 

Rii 

N<ri\ 

Rii 


MaJdng use of (28) in the above result we have the required formula 

ti/2 

(29) ri.23 


-(■-0 


By a cyclical permutation of the subscripts we can write at once the 
formulas for the multiple correlation coefficients of X 2 on Xi and Xg, 
and of Xb on Xi and X 2 . They are 



By writing (26) in the form 

we obtain the formula 

(32) 51.28^ = ciKl - ri.282) 

which is quite analogous to the expression for Sy^ in simple correla¬ 
tion. It is clear from (32) that 


(33) 


— 1 ^ rL28 ^ 1* 


Each of the formulas (29), (30), and (31) may be generalized for 
k variables. Thus the multiple correlation coefficient of order A? — 1 
of Xi with the other fc — 1 variables is 


ri.28 • . • * 



(34) 
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where now Rij is the cofactor of of R as defined in (23). While a 
mathematical generalization gives a more complete and aesthetic 
presentation, it is seldom that (22) or (34) are of value in practical 
cases for more than four variables. 

For computing purposes it is pleasant to know that multiple 
correlation coeflScients are expressible in terms of simple correlation 
coeflScients. 

Example 1. Three variables have in pairs simple correlation coefficients given 
by 

Ti2 * .8, fis = — .7, rj8 — —.9. 


Find the multiple correlation coefficient ri .28 of xi on Xi and Xs. 
Solution, 



Rn = .19, 


.8 -.7 

1 -.9 

-.9 1 

ri.28 “ .8013. 


= .068 


Example 2. Suppose it is found that rn « .6, ns = —.4, rj8 = .7. Comment 
on these results. 

Solution. R = —.346, Rn = .51, ri.n ~ 1*29. Inspecting the given r’s we 
observe that large values of X\ are associated with large values of X 2 , but since 
ri8 is negative it would mean that small values of Xi go with large values of xa 
which is impossible when rn and raa are positive. 


6, Limiting Cases. The following theorems are interesting in 
themselves and shed light on interpretations of the theory in applica¬ 
tions. 

Theorem I. The necessary and sufficient condition for coincidence 
of the three regression planes (19), (20), and (21), is 


(35) ri2* + ria^ + 7 * 23 ® — 2ri2ri8r23 = 1. 


Proof. From elementary anal 3 rtic geometry, we know that a 
necessary and sufficient condition that two equations of the first 
degree represent the same plane is that their coeflScients be propor- 
ticmal. For our equations this will be true when 

Rn _ Ru _ Riz 
Rii R92 Rzs 


Riz R22 Rzz 
Riz R23 Rzs 


and 
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When expressed in terms of r,/ these relations, it will be found, all 
satisfy (36). 

An aUemate proof is as follows. When = 0 there is perfect 
functional dependence between the variables, assuming linear regres¬ 
sion. It is evident from (26) that Si.ss* = 0 when R = 0. Upon 
expanding R in terms of ra and equating the result to zero we ob¬ 
tain (35). 

CoBOLUABY. Assuming linear regression, the criterion for perfect 
correlation between three variables is given by (36). 

Example 3. Given the following data, ru = .6, n* = .4. Find the value of 
rtt in order that n.ja = 1. 

SohUion. Substituting the given values in (35) we have 
r» - .48r - .48 = 0, 

where the subscripts are dropped for the moment. Solving, we find r = .24 
± .73. So rji = .97. 

The example shows that even though rn and ru are individually 
small, it does not follow that there cannot be high correlation between 
Xi, Xt, and xs. Indeed two variables which individually with a third 
variable have correlations which are apparently worthless for pre¬ 
dicting purposes may be very valuable when the three variables are 
taken together and multiple regression employed. On the other 
hand, it may be possible to get as good a prediction from ru or ru 
using simple regression as from multiple regression. This situation 
will be clarified by the following theorems. 

Theorem n. If ru = 1, then ri.u* = n** = n**, and St.si* = 
<ri*(l — rij*). 

Proof. When rja = 1 then Ru = 0 and it would appear from 
(29) that ri.u then becomes infinite. But this is impossible by (33). 
When ru *= 1 it will also happen that ru = ru- The student can 
easily verify this by letting r 28 = 1 in (25) and subtracting (c) from 
(b) there. So we shall first see what (29) becomes when rig = rig. 
If in (a) of (25) we let ru = ng we obtain B — Bn = 2 rigBi 2 , since 
Ru then equals Ru. Substituting this result in (29) we soon have 

—2rigiZig 2rig^ 

“ Ru ~l + ru’ 

remembering that ru =• ru. Now if we let rgg = 1 in the last 
expression we obtain the first conclusion of the theorem. The second 
conclusion follows from the first and formula (32). 
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In this case, then, multiple regression has no advantage over the 
simple regression Xi on X 2 or Xi on xzj because the standard error is 
exactly what it would be if the third variable were not added. Since 
^ 23 = 1, there is perfect linear dependence between X 2 and Xz, Geo¬ 
metrically, all the data lie in the regression plane. 



Theorem III. When r 2 z = 0 then ri. 23 ^ = + rnK 

Proof. When r 23 = 0 it is easy to show that R\i = 1 and R = 
1 — ri 2 ^ — ri 8 ^. So from (29) we have 

n.23^ = T\2^ + T\z^. 

The formula for the standard error of prediction then becomes 

iSi.23^ = <ri2(l - ri2^ - ris^). 

Hence, when X 2 and Xz are completely independent, multiple regres¬ 
sion gives a better prediction than would be given by either of the 
simple regressions Xi on X 2 or x\ on Xz] very much better if also ri 2 
and ri 8 are nearly equal. If they are exactly equal their maximum 
value is = .707. This theorem shows that one has a good 
regression equation for predicting when each of two variables is 
highly correlated with the third variable but not with each other. 

7. Partial Correlation. It is often important to measure correla¬ 
tion between two variables when the other variables have assigned 
values. For the case of three variables, to which we limit our atten¬ 
tion, consider a slab parallel to the 0 : 10:2 plane (Figure 13). This is 
a sub-set of N which forms a two-way correlation table in which 
the relations between 0:1 and 0:2 hold for a fixed value of Xz. The 
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correlation coefficient between Xi and X 2 in this sub-distribution is 
called the partial correlation coefficient between xi and X 2 for the 
assigned Xz and is conventionally denoted by 

ri2.3. 

The regression curves for the table consisting of this sub-distribution 
are called the partial regression curves. A classical example of a 
partial correlation coefficient is the correlation between statures of 
fathers and sons when the stature of the mother is a particular value, 
say 62 inches. 

In order to express ri 2.3 in terms of the total correlations r*/, as we 
were able to do in the case of ri. 23 , it will be necessary to assume a 
theoretical or ideal situation. Suppose we are dealing with a distri¬ 
bution for which the total regression curves are straight lines and 
the regression surfaces are planes. Then the partial regression line, 
Xi on X 2 , in our table at X 3 will be a section of the regression plane, 
Xi on X 2 X 3 , because the line will contain the mean points of all 
the Xi columns, defined by the points (x 2 , X3), which lie in the table 
at X 3 . 

In the two variable case, described in Part I, we learned that 
Sy^ was an average of the variances in the x arrays of y taken over 
all the values of x. Moreover, when the distribution was normal 
we proved that these variances were constant and was precisely 
this constant variance. The three variable case, in the ideal distri¬ 
bution we are about to consider, is quite analogous. Recall that 
Si. 23 ^ could be regarded, in the ordinary case of linear regression, 
as an average of the variances of Xi in the several columns at (x 2 , X 3 ) 
since, when regression is linear, the means of the columns lie on the 
regression plane. Now let us assume that the distribution is homo- 
scedastic in the Xi direction so that the variances in all the columns 
of Xx are the same. Under these assumptions, Si. 23 ^ is the variance 
in each column of Xi's. Let ai.z^ be the variance of Xi in the table 
at Xs. Remember that ri 2.8 is the correlation coefficient in this 
table and that regression is linear and homoscedastic. Therefore, 
for the variance S 1 . 28 * in each of the columns of this table we may 
write 

(36) Si.28* = ri 2 . 8 *). 

Now consider the two-way table of totals /(xi, Xa). In this table, 
ri 8 is the total correlation between xi and Xa, and <ri. 8 * is the variance 
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in an xs array of Xi’s. Since, under our assumption, is constant 
over these arrays, we may write 

(37) (ri.3^ = (ri^(l - na^). 

From (32), (36), and (37) we obtain 


that is 


(1 - ri.230 = (1 - ns^Xl - n2.3*) 


A 

liii 


7222(1 ri 2 . 3 ^). 


Solving, we have 


— 72 
R 11 R 22 


By expanding the 72’s it is readily verified that 
R 11 R 22 — 72 = (— R 12 X 


is an identity. Therefore, we have the final result 


(38) 


1 ^ 12.3 = 


"~7?12 


This may be written, if desired, in the form 


(38a) 


^ 12.3 


_ ri2 ri3r23 _ 

{(I~ri32)(l-r232)}^'2‘ 


By letting sin 0 = r, it is seen that tables of cos 0 = (1 — will 
facilitate the computation of (38a) in numerical problems. 

Since (38a) does not involve X 3 , the value of ri 2.8 for one assign¬ 
ment of X 3 is the same as for any other assigned value of X 3 . Therefore, 
not only must the distribution be homoscedastic in the Xi direction, 
but also the value of ri 2.3 in all slabs perpendicular to the X 3 -axis 
must be the same. It is fairly obvious that these conditions would 
not, ordinarily, be satisfied in practical applications. So, in the 
applications, ri 2.8 is regarded as a sort of average value of the partial 
correlations which could be obtained for all assignments of X 3 . The 
chief use of partial correlation is in testing what the correlation 
between two variables would be if the third variable were not inter¬ 
fering with the relationship. 
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Example 4. In a study of the factors which influence academic success/’ 
May* obtained the following results (among others) based on the records of 450 
students at Syracuse University. 


Xi « honor points, 
Xx = 18.5, 

cri « 11 . 2 , 
fia = .60, 


X% ^ general intelligence, 

X2 = 100.6, 

<T2 — 15.8, 
ru = .32, 


Xz = hours of study, 
Xs = 24, 

0-8 = 6, 
fas = “".35. 


One purpose of the study was to find to what extent honor points were related 
to general intelligence, when hours of study (per week) are held constant. Using 
(38a) it is foimd that na.s = .802. 


8. An Alternate Derivation. It is useful to approach the subject 
of partial correlation from another point of view. Assume, as before, 
that the variables Xi, X 2 , Xz^ are referred to the general mean as origin. 
Suppose that we wish to know what the correlation between Xi and 
would be if the influence of Xz were eliminated. Let us subtract from 
the X\ of each point that part of Xi which is due to the influence of Xz 
as indicated by the regression line Xi on xz and denote the residual 
by x\.z* Then subtract from the x^ of each point that part of Xz 
which is due to Xz as indicated by the regression line x% on Xz and 
denote the residual by 0 : 2 . 3 . Thus we have 


(39) 


OTl 

Xl,z = Xi ~ Tiz — XZf 
(Tz 

Xz.Z Xz TzZ Xz* 
0*8 


We shall now prove that the simple correlation coefficient between 
®i.s and xj.8 is precisely ri 2 . 8 . By definition, this simple correlation 
coefficient would be 


(40) 


22xi.8X2.3/(xi, Xi, Xa) 


Making use of (39), the numerator of (40) becomes 


22*1X2/ - ri8 —22*S*3/ ~ 2*1*3/ + 2*3*/ 

(Tz CTs ffz 

= NiffiffiTii — <ri<r 2 ri 8 r 28 — ffiffiTiaria + vitrarisTas) 

= Nffiffiint — riaraa). 

* Predicting Academic Success — Mark A. May, Journal Educational Psy¬ 
chology, 1923, vol. 14, 7, 429-140. 
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Now by (37), 


and similarly, 


<ri.8 = <ri(l — 
o’2.3 = <r2(l — 


Inserting these results in (40) we obtain the promised result 


ri2 — ri3r23 _ 

{(1 -ri8^)(l “r23^)}^'^' 

When interpreted according to this derivation, ri 2.8 is sometimes 
called the net ” correlation between xi and 
Interesting interpretations of multiple and partial correlation 
in terms of spherical trigonometry will be found in the following 
references; 

1. Burgess, The Mathematics of Statistics^ pp. 266-267; Houghton Mifflin Co. 

2. Jackson, The Trigonometry of Correlationy Journal of the American Mathe¬ 

matical Association, vol. 31, pp. 275-280. 


Exercises 

1. Find the multiple correlation coefficients and the regression equations for 

the data in Example 4. 

2. (Garrett) The r for intelligence and school achievement in a group of 

children 8 to 14 years old is .80. The r for intelligence and age in the 
same group is .70. The r for school achievement and age is .60. What 
will be the correlation between intelligence and school achievement in 
children of the same age? 

3 . (Yule and Kendall) The following means, standard deviations, and cor¬ 

relations are found for 

Xi = seed-hay crops in cwrts. per acre, 

Xa = spring rainfall in inches, 

Xg =» accumulated temperature above 42® F. in spring, 
in a certain district in England during 20 years. 


Xi = 28.02, 

<ri = 4.42, 

ria = 

.80, 

Hi = 4.91, 

<r, = 1.10, 

Tit « 

-.40, 

Hi = 694, 

(Tg — 85, 

ras = 

-.56. 


Find the partial correlations and the regression equation for hay crop on 
spring rainfall and accumulated temperature. 

4. Derive and explain the relation <ri* = <r^i* + Si,u^, What is the corre¬ 
sponding relation in simple correlation? 
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5 . The following data relate to land values and crops in twenty-five Iowa 
counties. 

Xi « average value per acre of farm land on January 1, 1920, 

X 2 = average yield of com per acre in bushels 1910-1919, 

Xa « per cent of farm land in small grain, 

Xi = per cent of farm land in com. 


County No. 

Xi 

X 2 

X, 

Xi 

1 

$ 87 

40 

11 

14 

2 

133 

36 

13 

30 

3 

174 

34 

19 

30 

4 

386 

41 

33 

39 

5 

363 

39 

25 

33 

6 

274 

42 

23 

34 

7 

236 

40 

22 

37 

8 

104 

31 

9 

20 

9 

141 

36 

13 

27 

10 

208 

34 

17 

40 

11 

116 

30 

18 

19 

12 

271 

40 

23 

31 

13 

163 

37 

14 

25 

14 

193 

41 

13 

28 

16 

203 

38 

24 

31 

16 

279 

38 

31 

35 

17 

179 

24 

16 

26 

18 

244 

45 

19 

34 

19 

165 

34 

20 

30 

20 

257 

40 

30 

38 

21 

262 

41 

22 

35 

22 

280 

42 

21 

41 

23 

167 

35 

16 

23 

24 

168 

33 

18 

24 

26 

115 

36 

18 

21 


(a) Find the linear regression equation of Xi on X 2 XsXi, 

(b) Estimate the first five values of Xi, using the equation obtained in (a)j 

(c) Calculate qm and n. 214. 



CHAPTER VI 


FUNDAMENTALS OF SAMPLING THEORY WITH SPECIAL 
REFERENCE TO THE MEAN 

1. Introduction.* To emphasize the viewpoint of the subject of 
this chapter it is convenient to recognize two general classes of prob¬ 
lems in mathematical statistics. In problems of the first class our 
concern is largely with the exposition of methods of characterizing 
observed data. Thus in the first class would fall methods for sum¬ 
marizing the pertinent information in a set of variates by means of 
averages, measures of dispersion, indices of correlation, etc. In 
problems of the second class, however, the data at hand are regarded 
as a random sample drawn from a well-defined class of variates called 
the population or universe of discourse, and we are concerned with 
drawing inferences about the universe from the sample. By a sample^ 
more precisely a random sampUy we mean a sub-set of variates in 
which each individual from the universe has an equal and independent 
chance to be included. From this chosen sample we attempt to draw 
inferences concerning the universe. In order to deal with this induc¬ 
tive argument we first consider a deductive argument; that is, 
we first consider an infinite (or finite) universe and investigate the 
behavior of samples according to the laws of probability. The 
methodology dealing with this class of problems is known as sampling 
theory. Although the two classes of problems are not entirely dis¬ 
tinct with regard to their treatment, the center of interest in sampling 
theory is the development of criteria for assisting common sense or 
educated judgment concerning the magnitude of chance fluctuations 
in statistical ratios, averages, and coefficients. 

The Bernoulli theory deals with sampling fluctuations in relative 
frequencies. In the words of Professor Rietz,^ 

But it is fairly obvious that the interest of the statistician in the effects of 
sampling fluctuations extends far beyond the fluctuations in relative frequencies. 
To illustrate, suppose we calculate any statistical measure such as an arithmetic 

* A reference list is given at the end of each of the following chapters to which 
attention is directed in the course of the discussion by the use of superscripts. 
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mean, median, standard deviation, correlation coefficient, or parameter of a fre¬ 
quency function from the actual frequencies given by a sample of data. If we 
need then either to form a judgement as to the stability of such results from sample 
to sample or to use the results in drawing inferences about the sampled population, 
the common sense process of induction involved is much aided by a knowledge of 
the general order of magnitude of the sampling discrepancies which may reason¬ 
ably be expected because of the limited size of the sample from which we have 
calculated our statistical measures. 

A statistical measure calculated from the actual frequencies given 
by a sample has been called a statistic by R. A. Fisher. ^ This is to 
avoid a verbal confusion with the corresponding parameter in the 
universe which we should like to know but can generally only esti¬ 
mate. It is a matter of common experience that a statistic will 
vary from sample to sample. To characterize the variation that 
may be tolerated on the basis of chance is one of the fundamental 
problems of sampling theory. 

In discussing such sampling fluctuations, Fisher® introduces the 
subject as follows: 

The idea of an infinite population distributed in a frequency distribution in 
respect of one or more characters is fundamental to all statistical work. From 
a limited experience, for example, of individuals of a species, or of the weather 
of a locality, we may obtain some idea of the infinite hypothetical population 
from which our sample is drawn, and so of the probable nature of future samples 
to which our conclusions are to be applied. If a second sample belies this ex¬ 
pectation we infer that it is, in the language of statistics, drawn from a different 
population; that the treatment to which the second sample of organisms had 
been exposed did in fact make a material difference, or that the climate (or 
methods of measuring it) had materially altered. Critical tests of this kind 
may be called tests of significance, and when such tests are available we may 
discover whether a second sample is or is not significantly different from the first. 

2. Method of Attack. The whole theory of sampling is based on 
frequency distributions and probability. In order to explain the 
tests of significance that have been developed, it is desirable to out¬ 
line briefly the philosophy underlying the method of attack. 

Sampling theory deals with specific questions like the following: 
Given the mean and standard deviation of a sample of N variates, 
how reliable are these estimates of the population mean and standard 
deviation, respectively? Given two samples, do their respective 
means or other statistics differ significantly? Can the differences 
be accounted for on the basis of chance or do the samples come from 
different populations? The answers require in general that we con- 
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ceive the universe as one distribution, the values of the statistic 
calculated from all possible samples of size N from that universe 
as another distribution, and that there are mathematical expressions 
capable of representing both distributions. This is the chief reason 
for studjdng frequency curves and probability distributions. 

Suppose, for example, that we have computed a statistic — say the 
mean of 100 observations or measurements. What we get is not an 
absolutely fixed quantity which may be exactly reproduced again 
by taking 100 similar measurements. Indeed, if such an experiment 
were repeated many times, we would get values for the arithmetic 
mean which would form a frequency distribution. This distribution 
would have its own mean (mean of means), standard deviation, and 
higher moments. The law describing the frequency distribution of 
all possible means of samples of size N from a specified universe is 
called a distribution function when it can be expressed mathemati¬ 
cally. Its graph is called the curve of means. What has been said 
of the mean holds similarly for any other statistic. 

Formulation of statistical judgment about a sample involves the 
specification of the universe and the determination of the distribution 
function of a given statistic in samples of a given size drawn from 
this universe. The problem of determining the distribution functions 
for the various statistics from specified universes is one which has 
challenged modem mathematical research. In most cases it has 
been necessary to assume that the parent universe is of the normal 
form in order to obtain analytically the sampling distribution of the 
statistic. Many of the tests of significance are based upon this 
assumption. However, considerable information about sampling 
distributions from arbitrary universes is known in terms of their 
moments or expected values. 

3. Expected Values. Let the continuous variable x be subject to 
the distribution function f{x) and let <l>{x) be an arbitrary function 
of X. Then the expected value of ^(a:), denoted by application of the 
operator E, is defined by 

(1) EUCx)} = f‘°4>(x)f(x)dx, 

provided this integral exists. In particular, if ^(x) = a^, (]e = 
1, 2, • • • )> we have 


E(a^) = f **/(«) dx. 

t/ — 00 
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For fc = 1, this defines the mean of the x’s in the universe represented 
byf(x). Hereafter we will denote the mean of a universe of x’s by f 
and restrict x to denote the mean of a sample from that universe. 
Therefore, we may write * 

(2) F(x) = f. 

If ^(x) = (x — sy, we have the variance of x, 

= Mx - xy 

^ ^ = E(x^) - 

The (positive) square root of is called the standard deviation or 
standard error of the distribution of x. Analogous definitions hold, 
of course, for y. 

If the variables x and y are simultaneously distributed in accord 
with the function/(x, y), then 

/ oo ^ oo 

I xyf(x, y) dy dx. 

. oot/ — » 

If X and y are not independent variables in the probability sense, 
then, as we have seen in Chapter IV, /(x, y) g(x)h(,y) where ^(x) 
and h(y) are the marginal distributions of x and y, respectively. 
The correlation coefficient, p, between x and y in the bivariate 
universe represented by /(x, y) is defined by 

E(xy) - 2 ® 

(4) p -- 

<r*<ry 

The quantities 2, a, p, etc., relating to a universe are called param¬ 
eters. 

The following propositions may easily be established from pre¬ 
ceding definitions so they are stated without proof. 

I. The expected value of the product of a variable and a constant is 
equal to the product of the constant and the expected value of the variable. 
Thai is, 

E(cx) = cE(x). 

II. The expected value of deviations of a variable from its expected 
value is zero. That is, 

E{x - 2 ) = 0. 

* 2 is read “ x tilde.” 
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III. The expected value of the sum of two or more variables is the sum 
of their expected values. In symbols, 

E{x + y + z) ^ E(x) + E{y) + E(z). 

IV. If X and y are mutually independent variables in the probability 
sense, then the expected value of their product is equal to the product of 
their expected values. That is, 

E{xy) = Eix)Eiy), 

V. The expected value of the product of deviations of two mutually 
independent variables from their respective expected values is zero. 
That is, 

E[{x - £){y - y)] = 0. 

VI. The expected value of the product of deviations of two correlated 
variables from their respective expected values is given by 

E\{x - x){y - y)} = p(T^(Ty, 

4. Standard Error of a Linear Function of Variables. Suppose 
a variable is a linear function of two or more independent * variables 
each of which may take on a universe of values and we require the 
standard error of this function in terms of certain moments of the 
underlying distributions of independent variables. To this end let 

(5) W = CiXi + C 2 X 2 + • * * + CnXj/ 

where each variable Xk, {k = 1, 2, • • • , N), is arbitrarily distributed 
and where the c^s are arbitrary constants. Let <7^ represent the 
standard error of Xk in the universe to which it belongs, and let pn 
represent the correlation coeflScient (if any correlation exists) between 
Xi and Xj\ We seek the standard error of w, cr^, in terms of cr* and 
Pij, {i = 1 to V, j = 1 to N), 

Case I. We will suppose first that the variables in the several 
universes are correlated, that is, that p<y ^ 0 for every combination 
of i and j. From (5) and Proposition III we have 

(6) E(w) = CiE{xi) + C 2 E(x 2 ) + • • • + cnE{xn), 
that is 

(7) ib = ciXi + C 2 X 2 -f • • • + cifXff, 

* We are using the phrase “ independent variables here in the ordinary 
sense of analysis to designate the variables on which a special function depends, 
without any implication that these variables are independent of each other in 
the statistical sense. 
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Then . 

E(w — •fij)* = ^c^^E(xi — 4- 'J^CiPiEixi - — if) 

»vy 

which by definition (3) and Proposition VI becomes 

(8) <r„* = 

9 

If Cl = 1, C 2 = ±1, and iV = 2, we have as a special case 

(9) = 0-i2 ± 2pi2^i(T2 + 0-22. 

Case II. Suppose the x^s in (5) are mutually independent in the 
statistical sense so that p</ = 0. Then (8) becomes 

(10) Ci2<ri2 -f- C2^<r22 + . . . -f- Cn^n^. 

6. Theorems. Relations (6)~(10) enable us to prove some in¬ 
teresting and useful theorems about the distribution of means of 
samples from an arbitrary universe. The following definition will 
make the notion of sample precise. 

Definition. Let {xi, 0:2, • • •, xn) he a set of N independent vari¬ 
ables each subject to the same distribution function p, so that their joint 
distribution function is 

f{xu X2y • • • t Xn) s g{xi)g{x2) • • • g{xN). 

Then (a;i, X2, • • • , xn) is called a random sample of N from a universe 
with distribution function g{x). 

Table 6 exhibits the notation which will be used for the moments 
of the several distributions referred to in Theorems I-III. 


Table 6. Notation 


1 

Universe 

Sample 

Distribution of 
Means 

Mean 


S 

E(Z) = * 

Standard Deviation 

(Tx 

s 

<^* 

Variance 




Skewness 

as:* 

as:* 

SbIs 

Kurtosis 

as;* 

ai:* 

&4:S 


Theorem I. If samples of size N be drawn from an arbitrary 
universe and if x be the mean of a sample, then the mean of all possible 
such means equals the mean of the universe. That is, 

fll) 
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Proof. In (5), let Ci = C2 = • • • = = l/iV and let Xi, X2, • • •, 

XNy constitute a sample from a universe with mean x and variance 
O’,*. Then w = x. As a consequence of the definition of sample, 
E(Xi) = X for each value of i from one to N. Therefore, (6) gives us 
E{x) = X. 

Theorem II. The variance of the sampling distribution of means 
from an arbitrary universe equals the variance of the universe divided 
by the number in the samples. In symbols, 


( 12 ) 



Hence, 


(12a) 



Proof. 

becomes 

(13) 


As in. the proof of Theorem I let w? = x. 



Then (10) 


Since the x^s constitute a sample, cr,* = o-,* for each value of i from 
1 to A'. So (13) reduces to (12). 

Theorem III. The moments describing skewness and kurtosis in the 
sampling distribution of means are related to the corresponding moments 
in the universe by the following formulas: 

a3;* 

az:x = “ 7 = > 

Vn 

Oa-.z “ ^ + jy (“<:» ~ 



A proof of (14) could be given by developing and applying addi¬ 
tional propositions on expected values. However, this method is 
tedious for the higher moments. A more elegant proof can be given 
by means of characteristic functions.^ Such a proof has been made 
available by Shewhart® for the discrete case. 

The first and second theorems show us that in repeated samples, 
X is distributed about x with standard deviation <Tx/{Nyf^. Theorem 
III tells us something about the form of the distribution. Thus if 
the universe is normal so that a^ix = 0 and ai:x = 3, then from (14) 
we see that as.* == 0 and 54:* = 3, so the sampling distribution of ^ 
fromanormal universe has the normal values for skewness and kurtosis. 
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In the next three theorems it will be understood that x and y are 
correlated variables which are jointly distributed in accord with an 
arbitrary function f(x, y) in which the parameters are x, y, cr,, vy, 
and p. 

Theorem IV. Let (*i, yi), (a:*, y*), • • • , {xu, yN), be a sample of 
N pairs drawn independently from the distribution characterized by 
f(x, y) and let (J, y) be the mean of a sample. The correlation coefficient, 
R, between the means of all possible such samples equals p. 

Proof. By definition, 


( 16 ) 

and 


R = 


E(2y) - xy 

- j 

(Tx<^y 


Eiftfi) = —E{{.X\ + a:2 + • • • + xti)iyi + y* + • • • + yn)} 


E(S) 

AT* ’ 


where 


S = a;iyi + xiy^ + • • • + Xiyif + 

X2J/1 + Xiyt + • • • + Xtyif + 

.+ 

XNyi + xifyi + • • • + xifyif- 


We will separate /S into two parts, conveniently called u and v, where 


u = xiyi + X2y2 + • • • + XNyN, 

and 

V = sum of (iV* — N) terms of the form i 9 ^ j. 

Then 

E(,u) = EiJ^ixiyi)} = J^\E(xiyi)} = NE(xy), 

1 1 

In V, Xi must be imcorrelated with y# since i 9 ^ j. Therefore, 
E{x^i) = E{Xi)Eiyi) = iy, 

and 

E(v) = (AT* - N) 3 !y. 

So we have 

E(.S) = NEiicy) + (AT* - N)^, 
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and therefore 

(16) ^ {E(xy) + {N - l)fy}. 


Making use of Theorem II and (16) the right member of (15) reduces 
to the definition of p. 

Theorem V. Let x he the mean of a sample of N from g{x) and let 
y he the mean of a sample of N from h(y) where g{x) and h(y) are the 
marginal distributions of the universe characterized hy fix, y) of corre¬ 
lated variables. Let w = x — y. The variance of the sampling 
distribution of w is 

(17) = ~ — 2pcr*(7y + 


The proof follows from (9) and Theorem IV. 

Theorem VI. Let x and y be the means, s* and Sy the standard 
deviations, and r the correlation coefficient in a sample of N correlated 
items. Suppose N is so large that s^ is a good estimate * of <r® and r of p, 
so that we may write 


<Tx 


2 _ 


ff! 
N ^ 



P = r. 


The variance of the sampling distribution of w = x y may be com¬ 
puted from the sample by the formula 


(18) 


(Tw* = 


N2 


53(a:i - yi)^ - 




N 


The proof follows from (17). 

6. An Experiment. We will now describe an exercise in experi¬ 
mental sampling which will help make the theory more meaningful. 
It was performed by a class of thirty students who took the distri¬ 
bution of Table 7 as a universe.'' 

In a box were placed 2000 discs t each bearing a number from the 
set 1, 2, 3, • • •, 25. The numbers on the discs were coded to the 

* The problem of estimation is discussed in the next chapter. 

t Small metal rimmed price tags were used. Ideally, each individual disc 
should be returned to the box before the next is drawn. However, this was not 
insisted upon and an entire sample may have been drawn before replace¬ 
ment. 
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Table 7. Span among Adult Males. (See Table 20, Part I) 


X 

/ 

68.5 

1 

59.5 

2 

60.5 

1 

61.5 

6 

62.5 

7 

63.5 

22 

64.5 

65 

65.5 

111 

66.5 

146 

67.5 

182 

68.6 

229 

69.5 

265 

70.5 

263 

71.5 

217 

72.5 

176 

73.5 

132 

74.5 

82 

75.6 

48 

76.5 

20 

77.5 

16 

78.6 

12 

79.5 

3 

80.5 

1 

81.5 

2 

82.5 

1 


span values in accordance with the scheme shown on page 107, and 
the frequency of the variously numbered discs equaled the fre¬ 
quency of the corresponding x’b. Each member of the class drew 
samples from the box according to the following directions. 

Directions 

1. Intermix the discs thoroughly and withdraw four random 
samples of ten discs each. 

2. Record the numbers in each sample of ten on the sampling record 
sheet (page 107); replace the discs in the box. 

3. For each sample of ten: find (a) mean span, (b) variance, (c) 
standard deviation. 

4. Combine the four samples into a single sample of forty and 
find the statistics named in 3. 
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Sampling Record Sheet 



Mean * | 

Standard Deviation 


* In computing the etatietica let x denote span and u the number on a disc. Then u ^ x — 57.5, 
5 u + 67.6, and — 8^, 


The results of 3(a) will be reproduced here. There were, of 
course, 120 means from samples of N — 10. These were then 
grouped into a frequency distribution. The resulting distribution 
and its moments, together with the moments of the universe, are 
given in Table 8. (The computations were made according to the 
definitions given in Part I for the moments of an observed distri¬ 
bution.) 

Although the chief purpose of the experiment is an appreciation 
of the theory, it is of interest to compare the experimental and 
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Table 8. Distribution op the Means op 120 Samples op N “10 Drawn 
PROM the Universe op Span 


Interval 

Midi 

Frequency 

Moments 

67.0-67.3 

67.15 

1 

Mean ^ “ 69.785 

67.4-67.7 

67.55 

1 


67.8-68.1 

67.95 

4 

Sx “ 0.8941 

68.2-68.5 

68.35 

4 

= 0.052 

68.6-68.9 

68.75 

5 

“4:i ■= 3.030 

69.0-69.3 

69.15 

19 


69.4-69.7 

69.55 

27 


69.8-70.1 

69.95 

20 


70.2-70.5 

70.35 

20 

X = 69.943 

70.6-70.9 

70.75 

7 

= 3.115 

71.0-71.3 

71.15 

6 


71.4-71.7 

71.55 

3 

ot3:x — 0.161 

71.8-72.1 

71.95 

3 

54:* = 3.296 


theoretical results. According to Theorem I the mean should be 
69.943; we obtained 69.785. According to Theorem II the stand¬ 
ard deviation should be 3.115/(10)^^^ = .985; we obtained .894. 
It is left as an exercise for the student to verify that the approxi¬ 
mations of the a’s are also close. 

We may think of this universe ” as approximating a Type III 
curve and the distribution of Table 8 as approximating its sampling 



Fig. 18. Depicting the Sampling Distribution op Means 
FROM A Type III Universe 

curve of means (Figure 18). To represent graphically a universe 
and the curve of sample means from that universe would require 
analytical expressions for both these distributions. As yet, neither 
a t 3 rpe of universe has been specified nor has the functional form 
of the curve of means from that universe been determined. How- 
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ever, Figure 18 will help the student appreciate the meaning of some 
of the moment relations developed in § 5. 

7. Reproductive Property of Normal Law. An important problem 
is to find the distribution function of the sum of several independent 
variables when these variables are normally distributed. It suffices 
to show how this problem can be solved for the sum of two such 
variables. The following discussion follows closely a proof given 
by Jackson.® 

Let X and y be independent variables and normally distributed 
about zero as mean with standard deviations a\ and <r 2 , respectively. 
Their distribution functions will have the forms 


g{x) = h{y) = C 2 e~^*, a = 




(2^2^) ' 


the explicit values 1/Ci = (ri(27r)^^^ I/C 2 = a 2 ( 2 Try^^, for total fre¬ 
quency 1, are not needed at the moment. 

If f(Xj y) is the joint distribution function for x and y with marginal 
distributions g{x) and h{y) wo shall first show that the frequency 
function, H{w)^ for the variable ti; = a; + y is 


/ CO 

f{Xy w x) dx, 

- 00 


For a < w < fiy when a — x<y<fi — x; these inequalities define 
a strip of the (Xj y)-plane for which the corresponding frequency is 

fi) = f f Six, y) dy dx; 

nj — cot/ oc—X 

in the integration with respect to y, the substitution w = x + y, 
y =: w — X, makes 


j r*/3 —X 

fix, y) dy = I fix, w — x) dw, 

a—® */a 


and hence 


F(a, /3) = I I f(x, w x) dw dx 

t/ — coU a 

pfi p CO 

— J J f(^i w x) dxdw 

=^£liiw) dw. 
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We can now proceed with the main part of the proof. Since x 
and y are independent, their joint distribution may be written 

/(*, y) = gix)h(y) 

and so we have 

/ o» 

f(x, w — x) dx 

- 00 

= CiCi f dx. 

— 00 

To evaluate this integral write the exponential expression in the form 

hw 


ax^ + b{w — xy = {a+ b) 




= (a + b)z^ + cw^, 


2 ab 
^a + b 


where 


z — X 


bw 

a 4“ 6 ^ 


c = 


ab 


d + & 2((ri2 + 0*2^) 


The value of w being regarded as constant for the integration with 
respect to x^ so that incidentally dz = dx, the expression for H{w) 
can be written in the form 


where 


Hiw) = CiC2e~^ r” dz 

«/ — 00 

K = CiCi f° e-(<‘-**»^dz 

%/ — 00 


== C1C2 —-pr 

[a + b 


1/2 


and 




~ = (<ri* + (^2*). 


If X, y, and u are independent and normally distributed, the 
quantity x + y + u can be regarded as the sum of the two inde- 
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pendent normally distributed variables x + y and w, and so is itself 
normally distributed. The conclusion can be carried over by induc¬ 
tion, without further calculation, to the sum of any finite number 
of variables. Hence we have the following theorem. 

Theorem VII. If Xi, X 2 , • * * xnj are independent variables and 
normally distributed with variances o-i^, 0 - 2 ^, • • • , the function 

N N 

w = ^CiPCi is normally distributed with variance 

1 1 

The essential feature of the theorem is the part relating to the 
form of the distribution. This rather remarkable property of a 
linear function of normally distributed variables is sometimes called 
the reproductive property of the normal distribution. The part of 
the theorem relating to the magnitude of the variance follows neces¬ 
sarily from a general formula which was previously established 
without supposing the variables normally distributed or otherwise 
specialized. 

Corollary. The sampling distribution of means from a normal 
universe is itself normally distributed. The mean of the sampling dis¬ 
tribution is the same as the mean of the universe and its variance is the 
variance of the universe divided by the size of the sample. 

The proof is left to the student. One should not conclude that it 
is generally true that the means of samples of N are distributed 
according to the same type of function which specifies the universe 
from which they are drawn. But the magnitudes of the mean and 
variance, as given in (11) and (12), are general in the sense that they 
are true for the sampling distribution of means from any infinite 
universe. 

8. Non-Normal Universes, From analytic considerations, com¬ 
paratively little is known at present about the exact distributions of 
statistics for samples drawn from non-normal universes. In a re¬ 
cent paper, Rietz^ has listed the contributions and summarized the 
progress that has been made in this connection. The reader may 
refer to this paper. 

With regard to the mean. Theorem III tells us that asix 0 and 
« 4 :x 3 as iV —> 00 . So, even though the universe is far from nor¬ 

mal, if the sample is made large enough, the sampling distribution 
of X approaches the normal form as characterized by skewness and 
kurtosis. (The conditions 0 f 3 = 0 and a 4 = 3 are necessary but not 
sufiicient conditions for a normal distribution.) Even for compara¬ 
tively small values of N there is sufficient experimental evidence to 
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consider the distribution of 5 as normal to a high degree of approxi¬ 
mation. 

Finite Universes. So far we have assumed that the universe 
was ** infinite/' that is, that it was indefinitely large in all its classes, 
as compared with the sample. This condition could be satisfied 
with a limited supply, for example in the experiment described in 
§ 6, by replacement after each individual draw. However, if the 
entire sample is drawn from a limited supply before replacement, the 
probability of drawing an individual from a given class will be af¬ 
fected each time that one is drawn from that class. In such a case 
the universe is said to be ** finite." 

If M is the total frequency of a finite universe, the first four 
moments of the sampling distributions of x are as follows: 


(19) 


E{x) = X 


t-2 = 


M - N 
N{M ~ 1) 


2 

7x ) 


= 


(M - 1)(M -2N) ^ , 
NiM - N)iM - ’ 




(M-1) { (M^-mN+M+6m)a4:x+3M{M-N- 1) (N- 1)} 

N{M-2)(M-3)(M-N) 


Their origin is doubtful.® They are more general than the formulas 
given in (12) and (14) and reduce to them if M . 

The conclusion of investigators is that the distribution of means 
from nearly any finite universe is practically normal. In this con¬ 
nection the following striking example is given by Carver.® 

A group of students chose arbitrarily the following most unusual 
distribution for a parent universe: 


Table 9 


X 

j 

16 

9 

3 

2 

29 

43 

405 

189 

1710 

37 

Total 

280 






113 


Fundamentals of Sampling Theory 

N 

and found the distribution of ^Xi = Nx of 1000 samples of twenty- 

1 

five variates each shown in Table 10. It was obtained as follows. 


Table 10 


Class 


5,000- 

2 

7,000- 

54 

9,000- 

203 

11,000- 

310 

13,000- 

254 

15,000- 

130 

17,000- 

36 

19,000- 

9 

21,000- 

2 

Total i 

1000 


Two hundred and eighty Hollerith cards were punched with numbers 
corresponding to the two hundred and eighty variates of the parent 
population. The cards were thoroughly shuffled and then placed 
in a tabulating machine. After twenty-five cards had run through 
the electric tabulator their total was recorded. By repeating this 
procedure one thousand samples were readily obtained. It is thus 
possible to obtain experimentally some appreciation of the sensi¬ 
tivity of the sampling distribution of means to changes in population 
form. Carver concludes that if the sample N is fifty or larger and 
the population is at least ten times iV, the parent population has 
relatively little control over the shape of the distribution of x. 

Another set of experiments was conducted by Shewhart^® who 
comes to the following conclusion: 

Such evidence, supported by more rigorous analytical methods beyond the 
scope of the present discussion, leads us to believe that in almost all cases in 
practice we may establish sampling limits for averages of samples of four or more 
upon the basis of normal law theory. 

9. Tchebycheff’s Inequality. In (1) replace x by w, let <l>{w) = 
{w — and in the expression for E{(w — wy] replace all values 
of w larger than {& + Ser by + 5(r and all values of w less than 
th — 5(7 by th — 6<r where 6 is a positive number. Then 

E{(w — wY} ^ + (SayPs 


( 20 ) 
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where 


jfc* 


(w — ibYf{w) dw ^ 0, 

W'-S<r 


and Ps is the probability that w lies outside the interval {Hb Btr, 
w + B<r). From (20) we have 


( 21 ) 




and therefore the following theorem. 

Theorem VIII. The prohahility is not more than 1/5^ that a value 
of w taken at random from the universe f(w) will differ from its expected 
value by more than a multiple B of its standard deviation. 

This theorem is known variously as Tchebycheff’s theorem, 
criterion, or inequality. A striking property is its independence 
of the nature of the distribution of w. But the gain in generality 
must be paid for and the price is inadequate information about the 
particular. That is, the inequality (21) may be too wide to be of 
practical value in passing judgments on sampling fluctuations in a 
known or proposed distribution. Nevertheless, it does have some 
useful applications, two of which will now be given. 

10. Law of Large Numbers. The Bernoulli theorem (Chapter I, 
§ 7) can now be established. Let w = x/s, x being the number of suc¬ 
cesses in s trials. Then w == p. Let Ps be the probability that 
x/s lies outside the interval (p — e, p + e), where € > 0. We may 
take € = B(pq/sy^^j a multiple of the standard deviation of the 
relative frequency x/s. Accordingly, by Theorem VIII we have 


Since 




B^ 


1 _ (pq/sY^^ ^ 


we obtain the inequality 




p(i - p) 


For any assigned e, Pa can be made arbitrarily small by increasing s. 
Thus x/s becomes increasingly reliable as an estimate of p as s 
increases. 
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The inequality of Tchebychejff can also be used to prove the sta¬ 
bility of the means of large samples. Consider a sample of N from 
/(x) in which the variance is Let w he sl linear function of the 
sample defined by 

Xi + ^2 + • • • + XjV 


Suppose is a constant such that ^ c^. Since w = x, we have 



Let P be the probability that {x — xY > h}, 
probability that 


(x — xY 


Nh^ 
> — 


£! 

N 


Nh^ 


Cf‘U} 


2 


That is, P is the 


Therefore, from Theorem VIII, 


P ^ 




Since c and h are fixed, P can be made arbitrarily small by taking N 
sufficiently large. Hence we have the following theorem. 

Theorem IX. The probability that the mean of a sample of N variates 
will differ numerically by more than a given positive number h from the 
mean of the universe can be made arbitrarily small by taking N suffir 
cienUy large. 

Under the conditions of the theorem, x is said to converge stochastic 
cally to X. This type of convergence, however, should not be con¬ 
fused with convergence in the sense of analysis. 

11. Probability Scale of Sampling Fluctuations. Now that the 
personae dramatis have been assembled, we can state a theorem 
which tells us what the approximate probability is that the mean of 
a sample will deviate by an assigned amount from a hypothetical 
mean. We are assuming here that o-* is known; the case where Ox 
is unknown will be discussed later. 
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We know that 2 is (or tends to be) normally distributed about x 
with standard deviation crj = ax/y/N. If the distribution of 2 be 
reduced to standard units by the transformation 

( 22 ) 

then we know that t is approximately normally distributed about 
zero with standard deviation of unity. Hence we can refer to a nor¬ 
mal probability scale for the probability that one would obtain a 
random sample for which x differs from x by as much as |5|, where 
8 is expressed in the az unit. So we have the following theorem. 

Theorem X. The probability Qs that a random sample from an 
infinite universe will have a mean, x, which will be within an interval 8 
of the mean, x, of the universe is approximately 

Q> = 2^*4*«) dt, 

where 8 is the observed value of t given by (22) and is the normal 
curve. Then = 1 — is the approximate probability that x will 
not be within \8\ of x. If the universe is normal, Ps gives the exact 
probability. 





Fig. 19. *=£ZZSJ. Q $ is the Pbobability for a Deviation as Small as 

|j|, AND IS THE PrOBABILITT FOR A DEVIATION AS LaROE AS |a| 

12. Null H 3 rpothesis and Significance Tests. The rationale imder- 
lying sampling theory has been summarized by E. S. Pearson^^ 
as follows: 
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In applying the methods of statistical analysis it is generally our aim to dis¬ 
criminate between two or more alternative hypotheses regarding the factors 
which have controlled certain observed events, which form what we term a 
sample or samples. If the process, is examined in a little detail it will be foimd 
that the procedure may be described as follows: 

(a) We define a hypothesis to be tested. 

(b) We choose the criterion (or criteria) whose numerical value, derivable 
from the observations, is most suitable for testing the hypothesis. In 
doing this we recognize that the criterion is not a single-valued expression 
even if the hypothesis be true, but will vary from one sample of observa¬ 
tions to another. 

(c) We therefore refer the observed value of the criterion to this sampling 
distribution — e.p., to a normal probability scale, etc. — and so obtain a 
measure of the likelihood of the hypothesis. 

(d) Finally, if judged on this probability scale the observed criterion is not 
exceptional, we conclude that upon the information available there are no 
grounds for discarding the hypothesis; or if the value prove exceptional 
we consider the possibility of alternative hypotheses. 

An hypothesis which is tested for possible rejection under the 
assumption that it is true has been called by Fisher a null hypothesis. 
In other words, null hypothesis refers to a partiftular form of popula¬ 
tion distribution which is assumed in considering whether or not a 
sample could reasonably have arisen from the population which, 
in fact, was assumed. If the sample could not reasonably have 
arisen from the population proposed, as measured by a significance 
test, we say that the null hypothesis is refuted for the level of signifi¬ 
cance adopted. If the significance test yields a verdict of “not 
significant ” for the probability level adopted, we say that the null 
hypothesis is not refuted or contradicted at that level. 

It is open to the investigator to be more or less exacting concerning 
the smallness of the probability he would require before he would be 
willing to admit that his test has demonstrated a significant result. 
Good judgment in these matters comes only from much experience 
in the particular field in which the problem occurs. However, it is 
conventional among certain workers to adopt the following rule: 

If Pj > .05, d is not significant; 
if Ps < .01, 5 is significant; 
if .05 > P8> .01, 

our conclusions about 8 are doubtful and we cannot say with much 
certainly whether the deviation is significant or not until we have addi¬ 
tional information. Other workers prefer a more conservative level 
of significance. 
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Example 1. Suppose the mean span of 100 persons is found to be 2 = 70.66 
inches. Does this differ significantly from the mean 2 = 69.943 of the ** uni¬ 
verse with standard deviation or* = 3.115? Calculating the above test we find 

h = - -= 1.99. Referring to the normal probability scale we find 

3,115/V100 

the chance of a difference between the observed and hypothetical means as large 
as that noted to be Pa = .0471. Our conclusion is that the given statistic 
X *= 70.66 is not exceptional, although it is possible that it came from a different 
universe, that is, in this case a different race of men. 

Example 2. Twelve dice were thrown 26,306 times (Weldon^s data), and a 
throw of 5 or 6 points was reckoned a success. The mean of the observed dis¬ 
tribution was found to be 4.0524. In tossing a true die the chance of scoring 5 or 
6 is I so the number of dice scoring 5 or 6 should be distributed with frequencies 
proportional to the terms in the expansion (f -f J)“. Therefore, the expected 
mean, on the hypothesis that the dice were true, is sp = 12(§) =4. Test this 
hypothesis using the difference between the observed and theoretical means as 
a criterion of judgment. 

Solution, <r. = = {(12) (}) (!)}*/» - 1.633 

N = 26,306, 


NII2 


. 010 , 


.0524 

.010 


5.2. 


The probability that a deviation outside 5 = ±5 would happen by chance is 
extremely small so we conclude that the dice were biased. 

13. Size of Sample to Have a Given Reliability. From Theorem X 
we may determine the size N of sl sample such that its mean, x, will 
not differ from x by more than a specified error \8\, with a degree of 
certainty equal to a specified probability. 


Example 3. The American Rolling Mill Company investigated the life of 
ferrous materials under different corrosive conditions. Data obtained from a 
certain kind of sheet material immersed in Washington tap water showed that 
the average time of failure of such sample was 874.89 days and the standard 
deviation of the time of failure was 85.31 days. There arose the following 
question of practical interest to the research engineer of this company: What 
sample size N must be used in order that for similar test conditions, the prob¬ 
ability shall be 0.90 that the average time for failure determined from the N 
tests will be in error by not more than 5 per cent of the average of the universe? 

Assuming that 874.89 ~ s and that means of samples of N are distributed 
normally, we may answer this question as follows: The allowable error is 5 per 
cent of 874.89 days or 43.74 days, and this m\ist correspond to a probability of 
0.90. From Theorem X we have 

Qt - 2j^*- .90, 
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that is f - .45, 

Jo 

whence from the tables we find 5 = 1.645. Hence N is found by solving the 
equation 

1.645-% = 43.74, 

VN 

where = 85.31. We find N = 10. 


14. Difference in Proportions. In the analysis of data obtained 
by sampling, certain problems occur which relate to the significance 
of apparent differences in proportions. Suppose we have two random 
samples of size ni and n 2 , respectively, with Xi individuals of the 
rii items and X 2 of the n 2 items which have a certain character or 
attribute. The question arises as to whether the observed difference 
is merely an accident of sampling or whether a similar difference 
exists in the universe. The following theorem may be used to test 
the null hypothesis that Xi/rii and X 2 /n 2 are random and independent 
samples from the same universe. 

Theorem XI. If Xi/rii and X 2 ln 2 are random and independent 
samples from an infinite universe in which p is the proportion of indi¬ 
viduals which have the character in question^ the probability that the 
difference in the proportions obtained will be numerically as great as the 
observed difference w = \xi/ni — X 2 /n 2 \ is approximately, P^y where 
Ps is defined in Theorem X, and 



Proof, According to the Bernoulli theory, Xi/ni will vary about 
an expected value p with variance pg/ni, where g = 1 — p. Simi¬ 
larly, X 2 /n 2 will vary about p with variance pq/n 2 - Then 


and from (10), 


(23) 


ffu) 


pq pq 

-1- 

Hi Tli 


Therefore, w varies about zero with variance given by (23), and the 
ratio 


(24) 





varies about zero with unit standard deviation. 
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Information about the form of the t distribution may be obtained 
from its higher moments. It is not difficult to show that 


«8 


2 s 


1 - 4cpq (ni - 


( 26 ) 


pq 


X 


uin^iui + 


54 = 3 + 


— &pq ^ ny — 711712 + n2^ 
pq 


For fixed values of p and g, it is clear that as —> 0 and 54 3 as the 

samples are taken indefinitely large. Even for moderately small 
samples the distribution of t does not differ greatly from the normal 
form. The following empirical rule, suggested by E. S. Pearson, is 
useful when one is in doubt about the propriety of referring (24) 
to the normal probability scale. 

Rule. Suppose rii < ns {we are at liberty to call either ni). If 
Uip > 5, the use of the normal probability scale is justified. If 
Uip ^ 5, examine aa^. If az^ < .04, it is still sufficiently accurate. 
But if oLz^ ^ .04, no great confidence can be placed in the test. 

In order to apply Theorem XI an estimate of p is usually required. 
For this purpose 


(26) 


Xi + X 2 
ni “1“ ns 


is usually taken as the best estimate of p which is available from 
the samples. It is easy to show that E{p) = p. 
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Problems 

1. Suppose a variable w is normally distributed and a value is selected at ran¬ 

dom. Show that the odds are about 369 to 1 against the value differing 
from E{w) by more than 3 ctfs- 

2. (a) Consider a finite universe of 5 variates: Xi, X2, xs, X4, xs . The number 

of distinct samples of 3 variates each that may be drawn is C(5, 3) = 10. 
Write these down. 

(6) Let Si represent the ith samide mean and write down the 10 distinct 
sample means. For example, 

Xi X2 -h Xs 

-i- 

(c) Show that the mean of the 10 values of Xi is the mean of the 6 values of 
Xi, Thus, 

-i:2<= 2- 

What formula does this example illustrate? 

3. Show that the expected value of is greater than the square of the expected 

value of w. 

4. From a box containing 2000 discs representing the distribution of span, 

draw a sample of 25 and compute its mean and standard deviation. Test 
the significance between your mean and the mean of the imiverse x = 
69.943 inches. 

6. Suppose the weights of a sample of 1000 men of the same age are obtained 
yielding 2 = 140 lbs. Assuming that == 20.0 lbs., what is the standard 
error of the mean of this sample? What is the probability that this mean 
does not differ from the mean of the universe at this age by more than 
five pounds? 

6. (Camp^^) The mean age of death of men who are alive at age 20 is, in the 

United States, 59.13. For the city of Chicago it is 58.98, and in 1910 the 
male population of age 20 was 24,000. Can the difference between the 
United States and Chicago be explained on the hypothesis of chance? 
Assume ax ~ 10 years, and that the distribution of the universe is ap¬ 
proximately normal. 

7. {Camp^*) A fraternal organization wishes to be very sure that the average 

age of death in its group of men now aged 20 will not differ from the ex¬ 
pected 59.13 years by more than one year. By “ very sure it means that 
must equal .999 or more. How large should the group be? (Assume 
as before that <r* = 10.) 
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8. Given that 

k 

V) *= + *<)• 

1 

k 

If the z’s are independent and is a constant, show that 

1 

k 

(Fto* = 

1 

where o-i* represents the variance of Xi. 

9. Find the mean value of all positive ordinates of the first quadrant of 

a-a + * r*, 

(a) when equally spaced along the 2 ;-axis, 

(b) when equally spaced along the circle. 

Ansivers: 



10. Find the mean value of all the ordinates of the curve y — a + from 0 to a;, 

when equally spaced along the a;-axis. 

11. Derive (25). Hint, a, = E(f) = 

12. Show that the moment relations in (19) reduce to the corresponding rela¬ 

tions in (12) and (14) if Af —► «. 

13. Suppose 300 mice having cancer of about the same degree of malignancy 

were divided at random into two groups of ni — 100 and n* = 200, re¬ 
spectively. The first group was given a certain serum treatment which 
was withheld from the second group but otherwise the two groups were 
treated alike. Among the serum treated there were a;i = 8 deaths, and 
among the other group there were » 2fi deaths. Test the significance 
of the difference between the mortality of 8% and 12i% in the two groups. 

14. An instructor had two classes of 20 and 30 students in the same subject. 

Four in the smaller class and 8 in the larger made grades of B or better. 
Should one seek a further explanation of this difference beyond variation 
due to sampling? 



CHAPTER VII 

SMALL OR EXACT SAMPLING THEORY 

1. Introduction. A theory of sampling which assumes that N is 
large is inadequate for many practical problems. In recent years 
a theory has been developed to give more exact methods in dealing 
with small samples. In the practical field, the call for the solution 
of problems based on comparatively few observations was first 
realized in 1908 by a young man, then unknown, who chose to 
publish his results under the now celebrated pseudonym of Student.'' 
Since then, many important contributions have been made toward 
the development and extension of this theory. Its applications are 
widespread. In the opinion of the present writer, continuity between 
large and small sample theory is an essential part of the newer atti¬ 
tude. In general, the methods of the theory of small sample theory 
are applicable to large samples, although the reverse is not true. 
It is our purpose in this chapter to facilitate an appreciation of 
some of the simpler aspects of this theory. The treatment centers 
around significance tests for means, variances, and correlation co¬ 
efficients. 

2. Expected Value of s\ By definition, the variance of a sample 
is given by 

( 1 ) =- — - x^. 

Then the expected value of from repeated samples is 

E(s^) = -E + • • • + I - *). 

Since the x’b constitute a sample we may write 

E(xi^ + X** + • • • + xn^) = NE(x^), 
and from (16) of Chapter VI, replacing yhyx there, we have 

E(x^) = ^ {E(x^) + (N- l)x*}. 
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Therefore, 

EW = ^ {NE{x^)\ - ^ {E{x^) + iN- l)i*} 
= {E{x^) - x*}. 

Hence 

( 2 ) £(«*)= 


where is the variance of x. 

We may also obtain (2) as follows; Consider independent samples 
each containing N variates ui, W 2 , * * * , wjv, where — x. For 

any sample. 


5^ 


1 ^ 
1 ^ 



2 ^ 

— » < 3 , 


since the square of a sum is equal to the sum of the squares plus twice 
the cross-products. Then 

W 


By Proposition III of Chapter VI the right-hand member of the 
above expression may be written 


which becomes 


N 


m 



Since = 0, by Proposition V, we have the final result 


^(8*) = 


N 

N 


<r 


a 


This result is sometimes stated as in the following theorem. 
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Theorem I. The mean of the sampling distribution of s* from an 
Brbitrary universe equals the variance of the universe multiplied by the 
factor* {N — 1)/-^^. 

It is to be anticipated that the expected value of s^ is less than <r*, 
as the following analysis will show. The variance <r^ refers to devia¬ 
tions from X, whereas any s^ refers to deviations from an x. For any 
sample, then, we may regard x as an arbitrary origin. Since in the 
case of any sample, the sum of the squares of deviations from its 
mean, x, is less than the sum of the squares of deviations of the same 
variates from an arbitrary point x (unless the sample is one whose 
mean falls at x), it is to be expected that the mean of all the values 
of s^ will be less than <r\ Relation (2) measures the extent of this 
inequality. 

3. Unbiased Estimates of Population Parameters. A distribution 
function is not only a function of the variable involved, but it is also 
a function of the parameters, or hypothetical quantities, which are 
introduced to specify the universe sampled. In the case of a 
Bernoulli distribution the parameter is p, in the Poisson law 
it is m, and in a normal distribution there are two parameters, 
X and (T. 

A function of the variates given by a sample for estimating a 
parameter is called a statistic. Let § be a statistic corresponding to 
a parameter 6 in the universe. We now state the following 

Definition. If the expected value of §, E0)j equals 6 then 6 is 
called an unbiased estimate of B, 

It is clear from Theorem I of Chapter VI that the mean of a sample 
is an unbiased estimate of the mean of the universe. Also from (26) 
of Chapter VI we see that p defined there is an unbiased estimate 
of p. 

Before the relation = Ox^/N can be of much use to us in the 
applications we must have an estimate of a-*^ from the sample or 
samples available. By Proposition I of the preceding chapter, 



= (^2 by (2). 


* This factor is sometimes called ** Bessel’s correction.” Perhaps it should 
be attributed more appropriately to Gauss who made use of it, in this connec¬ 
tion, as early as 1823. 
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Let be aa vmbiased estimate of 0 ’^. If this estimate^ is based on 
a single sample we have 


(3) 


a* 


N 

N-1 


N 

'^(Xi - *)* 


If n — iV" — 1 it is obvious that 


(3a) = 

It is conventional^ to take 

(4) a = 


n 


- cr\ 

n + 1 

N 

N-1 


s 


as an estimate of <r. If N is large the difference between unity and 
the coefficient of s in (4) is negligible in numerical problems. With 
N large it would not be invalid, to any appreciable extent, to use s 
as an estimate of <t. 

If two independent samples are available from the same universe, 
an unbiased estimate based on the two samples is given by 


( 6 ) 


N-2 




where 


q = Msi* + N=Ni + Nt. 


8i* and Si* being the variances of samples consisting of Ni and iV* 
variates, respectively. It is left as an exercise for the student to 
verify that the expected value of q/{N — 2) is a*. 

In case k independent samples are available from the same universe, 
we may generalize (5) and write 


( 6 ) 


a* = 


Q 


where 

Q = NtSi* + NiSt* + ... -f- Nifii?, 

V^N^ + Ni+... + N„ 

and 8{* is the variance in the ith sample consisting of Nt variates. 
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When is used in future discussions it will be clear from the context 
whether this estimate is based on 1, 2, or A; samples. 

U Ni = N is ihe same for every sample, (6) reduces to 


where U = Nk, Clearly, (7) may be written in the form 
(7a) - Jj, - 0-2 = -(Si2 + 52^ + + • • • + Sk^). 


When k is taken infinitely large so that U becomes the universe, the 
right member of (7a) then refers to the expected value of and 
becomes itself. So as —> oo the limiting value of (7a) becomes 


N -1 
N 




as given in (2). 

As an alternate to (7), in the case where all samples contain the 
same number of variates, we may take 


( 8 ) 


TT^ X 7 (si + 52 + S3 + • • • + SA:) 
b{N) k 

X mean value of standard deviations, 

b{N) 


where b{N) is a function of N and approaches unity as N increases. 
The exact expression for b{N) will be derived in § 7. Its approxi¬ 
mate value is b{N) = 1 — 3/(4/V'). As A; oo the limiting value of 
(8) becomes 


(9) 


<T = 


m 

b{N)' 


In § 7 we will prove that b(N)a- is the mean of the sampling distri¬ 
bution of 5 from a normal universe whose standard deviation is cr. 
Values of b{N) and its reciprocal have been tabulated by E. S. 
Pearson® and others,^ and we have included a short table in § 7. 

As an alternate to (4) we have from (8) when fc = 1, 


d- = 


5 

W)' 


( 10 ) 
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4. Degrees of Freedom. In § 2 we have proved, essentially, that 
the expected value of is (N “ where the N values 

of a; in the sample are subject to the linear restriction ^Xi = Nx. 
This is equivalent to proving that the expected value of is 

(N — l)<r^ when the x^s are subject to the linear restriction = 0- 
Suppose, however, that there are A; < JV linear restrictions on the x^s. 
What, then, is the expected value of ^Xi^? A. T. Craig® has proved 
analytically that if Xi, X 2 j • • • , xn, are N independent values of a 
variable which is normally distributed about zero with variance o'* 
and if the N values of x are subject to A; < AT homogeneous linear 
restrictions, then the expected value of ^Xi^ is (N — fc)o'*. The num¬ 
ber n ^ N — k is frequently called the number of degrees of freedom. 

6. “ Student’s ” Distribution. The formula used in testing a null 
hyx)othesis that a given sample comes from a universe with a pro¬ 
posed mean is 


As stated in Chapter VI, (11) is normally distributed if the universe 
is normal. On the side of applications, <t is seldom available and 
usually must be estimated from the data available. If we substitute 
into (11) the estimate of o- given in (4) and calculate 


( 12 ) 


(3c -X){N - 1)1/* 

-, 

5 


we are not justified in asserting that (12) is normally distributed 
unless N is large. And so, in testing the significance of the mean 
of a small sample we are not justified in referring (12) to a normal 
probability scale. The variability of s from sample to sample 
invalidates that procedure. 

While Helmert obtained the distribution of s* as early as 1876 
it seems that Student was the first to recognize the importance, 
for the theory of small samples, of taking account of the variability 
of «in (12). By means of a remarkable intuition he obtained, some¬ 
what empirically, the joint distribution function of x and s from 
a normal universe. Later writers, notably Fisher, established his 
results rigorously. 

“ Student ” actually found the distribution of a slightly different 
variable, viz.^ 
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Obviously, z is functionally related to t by 

(14) z = t{N- 

so the distribution of t can easily be obtained from that of z. In 
deriving the distribution of z we shall follow the proof given by 
Fisher J To avoid interrupting the main development some of the 
details will be deferred to the next section. 

Consider a normal universe with frequency element 

df = (27r<r2)-i/2e-(*-5)V2<r2 

Let a sample (xi, X 2 , * • • , xn) be taken at random from it. Then the 
probability that the sample will lie in the element of volume 

dv = dxi dxi • • • dxN 
is 

(15) dF = ( 27 r(r 2 )~^/ 2 e-^*/ 2 -^ dv, 

N 

where = S(^» — From the relation* ^2 = 1^2 — J'l* we have 

72 = JV 32 + « ^) 2 ^ 

Hence (15) may be written 

(16) dF = 

By means of iV-dimensional geometry (to be explained in § 6 ) Fisher 
showed that the element of volume dv can be expressed in terms of 
the variation of x, namely, dx, and the variation in volume, d(s^“i), 
of an {N — l)-dimensional hypersphere of radius so that 

(17) dv = Cs^~^ dx ds, 
where C is a constant. Then (16) becomes 

(18) dF = ds dx. 

From (18) the distribution of z can be deduced. From (13) we 
obtain dx = s dz for a fixed value of s. Substituting in (18) we 
obtain, for the joint distribution of s and z, 

(19) ji.g-.iNr.2(i+.2)/2crag]v-i ds dz. 

This expression is defined for s ^ 0 , being identically zero for s < 0 
since s is taken as the positive square root of s^. If s is integrated 

♦ Cf. Part I, Ch. IV, § 9, 
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out of (19), the distribution of the single variable z is obtained. To 
perform this integration, let 

2/ = s(l + ds = (1 + «2 )-i/2 

and integrating with respect to y from 0 to oo, we have 

which reduces to 

K(1 + dz 

where, as shown in § 4 of Chapter II, 



Therefore, the distribution function for “ Student^s ” z is 



The curve is symmetrical with mean zero and infinite range. It is 
quite different, however, in mathematical character from the normal 
curve although it approaches this form as iV —> oo . (Cf. § 9.) 
From the viewpoint of sampling theory the important property of 
(21) is its independence of <r. The revolutionary character of this 
property is revealed in certain applications that involve drawing 
probable inferences from small samples, say from a sample of iV = 10. 

Using (14) Fisher modified (21) and obtained the distribution 
of t which is the one now widely applied. Before discussing the 
(-distribution, we shall give the details of Fisher’s derivation of (21) 
and consider the separate distributions of x, s^, and s. 

6. Fisher’s Derivation. Making use of the geometrical method 
employed by Fisher^ we shall imagine an iV-dimensional space in 
which we take the origin at the point 0(x, x, • • • ,x) and rectangular 
axes Ouiy Ou 2 , • • •, Ouif, A point can be located in a space of a 


j (1 + dz 
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specified number of dimensions by associating with the point a set 
of numbers. Therefore, we may represent the sample by the point 
P(ui, U 2 f • • • j Un) where Ui = Xi — x. Although it is impossible to 
visualize a space of N dimensions for iV > 3 we will carry through the 
argument for the general case by analogy with the case for iV = 3. 
So we consider the latter case first. 

When AT = 3, the sample is represented by the point P(wi, U 2 , ua) 
and we have the mean u and variance defined by 

(a) tti + U 2 + Us = 3iZ 


and 



Fig. 20 


For an assigned iZ, (a) represents a plane; and, for an assigned pair 
of values of (iZ, 5), (6) represents a sphere with center at the point 
Af(IZ, iZ, iZ). The line 

(C) Wj rx t^2 * Uz 

has direction cosines each equal to 1/(3)^^* and is normal to the 
plane (a). The perpendicular distance of P from this line is 

MP = s(3)i'2 

as can be seen from (6). We require the probability, to within 
infinitesimals of order higher than dU ds^ of getting a sample of 
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JV = 3 independent values of u which will simultaneously yield 
values of U and s which lie within the region bounded by iZ, iZ + dU 
and s, s + ds. Following the method of § 5, an element of this 
probability density is given by the expressions 

dF = (27r<r2)-‘3/2e-(tti*+tt2*+«3*)/2^* dv 

where 

dv = dui du 2 duz. 

As the sample point P(wi, U 2 , uz) varies, u and s also vary. Cor¬ 
responding to different values of s there are a set of concentric spheres 
defined by (h) all having the same center. Since the plane (a) 
passes through the common center of the spheres, the region dv is 
a shell between concentric spheres of radii VSs and VF(s + ds). 
To use a homely illustration, dv corresponds to one of the successive 
layers in an onion. Our problem is to express dv in terms of u, s, 
duj and ds. Now the line (c) meets the plane (a) at M and the 

distance OAf is 

OM - 

so we have the differential element 
d{OM) = (3)^/2 du. 

Since the plane (a) passes through M, the intersection of the plane 
and sphere is a great circle with center at M and radius equal to 
s(3)^'®. The area of this circle is 

A = 

and the differential element dA is 

dA Qts ds. 

Therefore, within infinitesimals of higher order, 

dv * dA d{OM) 

= Cis d§ du 

where here and hereafter, in this section, the C’s are constants. So, 
the required probability is 

dF = ds dU, 

Passing now to the general case involving N-space, let P be the 
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point representing the sample (wi, W 2 , • • * , un)- Then PM is the 
perpendicular from P upon the line 

(d) = W2 = • • • = uif 


and we have 

OM = {Nyi^u, OP^ = 

MP^ =^0P^ - OM" = 

In iV-space, the plane (a) generalizes into the hyperplane 

(e) 

and the sphere (6) generalizes into the hypersphere 

(/) 


with radius MP = and center at (H, IZ, • • • , IZ). Now, the 

hyperplane (e) will intersect the hypersphere (/) in an {N — 1)- 
dimensional hypersphere to correspond to the circle for the case 
N == 3. Consequently, for a given pair of values of u and s, the 
point P will lie on an {N ~ l)-dimensional hypersphere orthogonal to 
the line OM. The volume of this (N — l)-hypersphere is given by 


and so 


A = CaCViVs)''-* 


dA = ds. 


.Therefore, the volume dv = dv^du 2 • • • duN between two concentric 
spheres of radius ViVs and ^N{s + ds) is approximately 


dv = dA d{OM) 

= ds dU, 


Since dui = dxi and du = d-r, (17) is established. 

7. Distributions of x, s^, and s. Taken Singly. It is clear that 
(18) may be written as follows: 

(22) dF = ^e-^'*^(5-S)«)/2.^(g2)(w-»)/2 ^(s*) di 

= dj X d(8^). 

From this factored form it follows that 

(a) The law of distribution, 0(1), of sample means from a normal 
universe is given by 

(23) 


G(X) = 
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it being fairly obvious from the form of G{x) that 



It may also be evaluated by imposing the condition that 

f G(x) dS = 1. 

Evidently, G(x) is a normal distribution with mean equal to 5 and 
standard deviation equal to a result already familiar from 

Chapter VI. 

(6) The variance, of a sample is distributed according to 

(26) H{s^) = (s2)(Ar-8)/2^ 

where (see § 4, Chapter II) 



Thus the distribution of the variance was found by first finding the 
simultaneous distribution of the variance and mean. Clearly, 
H{s^) is a Pearson Type III curve with range limited at one end, 

= 0 , and not at the other, 52 = 00 . 

(c) The variance, s^, and the mean, x, are distributed quite inde¬ 
pendently, that is, 

F{x, 52 ) = G(x)H(s^). 

It has recently been proved by Geary® that a necessary and suflScient 
condition that x and s® from samples of N values of x be independent 
in the probability sense is that the x^s be normally distributed in the 
parent universe. 

In § 2, the mean of the sampling distribution of 52 from an arbitrary 
universe was obtained. It is interesting to verify that result in the 
present case where we know the distribution function. The mean 
of the distribution of variances of samples of N from a normal 
universe is given by 
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where H(s*) is defined in (25). So we have 
E{s^) = 


, /2<r2\(^+»/* /jy + 1\ 
-‘•(at) '■(—) 


W-l , 

= -O' . 

N 

The standard deviation of the H(s^) distribution is, approximately, 

The distribution of the standard deviations of samples of N from 
a normal universe is readily found from (25) and (22) to be 

(27) h(s) = 

So its mean value is given by 


00 

E{s)=J h{s)sds 


which yields the result 




Upon substituting the value of ki given in (26), the above expression 
becomes 



If we denote this coefficient of a by HN) we have the relation 

_ Ejs) 

"" HN) 

which was promised in § 3. Romanovsky* showed that 


3 


7 
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Table 11 


AT 

IMN) 

2 

1.772 

3 

1.382 

4 

1.253 

5 

1.189 

6 

1.151 

7 

1.126 

8 

1.108 

9 

1.094 

10 

1.084 

20 

1.040 

30 

1.026 

50 

1.015 

100 

1.008 


The modal value of s, easily 


Table 11 gives values of the reciprocal 
of biN) for a few values of N, 
Romanovsky also deduced the 
standard deviation of the A(s) dis¬ 
tribution to be 

/I 3 _ 

8 i\r* 16iV» 

The approximate value 

/ 1 \l/2 

(29) .. - y . 

is frequently used in practice and 
this is the basis for the common 
statement that the standard error of 
a standard deviation is 1 /( 2 )^^^ that 
of a mean, 

found* by differentiating /i(s), is 


(30) 



If we make the substitution y = s — S, then the distribution of y is, 
to a first approximation, the normal curve 

(31) Const. X 


with standard deviation <r/( 2 iV)*/*. 

8 . The (X, s)-Frequenc 7 Sruface. We may regard F{x, s) as 
describing a frequency surface if the total volume under the surface 
represents the expected frequency of the means and standard devia¬ 
tions of all possible samples of size N. In depicting this surface it is 
convenient to let iZ = S — x so that the origin of fZ is at S = 35. 

Since 


n: 


F{Xf s) dZ ds ^ 1 , 


then the volume under the surface over a closed contour in the 
iZ^-plane represents the proportion or percentage of samples whose 

* If we make h(a) a maximum for variation in 7 we find that 

7 - m*8/(N - or t ~<r{(N - (cf. Eider"). 
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means and standard deviations fall simultaneously within the ranges 
defined by the boundary of the given contour. In an illuminating 
paper by Deming and Birge two such frequency surfaces are rep¬ 
resented. These are reproduced in Figure 21, one for a small value 
of N and the other for a comparatively large value of N. 




Fig. 21. The Surface F(Z,s) Illustrated by Sections 

As the authors point out, the highest point of the surface has the 
coordinates w = 0, s — <r[{N — 2)/NY'^, Because of the inde¬ 
pendence of X and s, all plane sections s = constant will be normal 
curves with standard deviations equal to The U = con¬ 

stant sections will be skew curves whose equations are given by h{s). 
They will all have the same mean and mode. As N increases, their 
mean and mode approach coincidence with the value cr while the 
curves lose their skewness and become normal with center at s = o* 
and standard deviations equal to (T/{2Nyf^, As N increases, the 
surface becomes more and more concentrated about the point 
iZ = 0, s ~ O’. 

9. Fisher’s f-Distribution. Substituting (14) into (21) and re¬ 
placing iV — 1 by w we obtain 

( /2\~(n+l)/2 
1 + -) 

where 1/Kn == n^'®B(n/2, 1/2), B being the Beta function. 

Inasmuch as (32) is independent of o-, it can be used in situations 
in which the value of <r is unknown. The quantity t involves no 
hypothetical quantities, being completely expressible in terms of the 
variates. 
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In 1925, “ Student ” published in Metron^^ an extensive table of 

the probability integral / Fn(f) dt. More recently, Fisher*’ has 
*/ — 00 

given a short table of the probability P of occurrence of deviations 
outside for values of t and n commonly met in applications of 
small sample theory. Let 

P ««) = 2^^»(0 dU 

Then the probability P tabulated by Fisher is 
(33) P = 1 - Pn(0. 

Fisher^s table gives successive columns showing for each value of n, 
from n = 1 to n = 30, the values of t for which P takes the values 
given at the head of the columns. A general idea of the table may be 
obtained from the portion which we have reproduced * in Table 12. 


Table 12. Values op t from Table IV op Fisher's Text 



.9 

.7 

.5 

.1 

.05 

.01 

3 

.137 

.424 

.765 

2.353 

3.182 

5.841 

4 

.134 

.414 

.741 

2.132 

2.776 

4.604 

5 

.132 

.408 

.727 

2.015 

2.571 

4.032 

6 

.131 

.404 

.718 

1.943 

2.447 

3.707 

8 

.130 

.399 

.711 

1.860 

2.306 

3.355 

10 

.129 

.397 

.706 

1.812 

2.228 

3.169 

15 

.128 

.393 

.691 

1.753 

2.131 

2.947 

20 

,127 

,391 

.687 

1.725 

2.086 

2.845 

30 

.127 

.389 

.683 

1.697 

2.042 

2.750 

00 

.1257 

.3853 

.6745 

1.6449 

1.9600 

2.5758 


The number n, with which to enter the table, is determined by the 
number of degrees of freedom involved in the available estimate (§ 3) 
of <r\ In testing null hypotheses the rule given in § 12 of Chapter VI 
may be used, where, of course, P^ is to be replaced now by P. 

The distribution of t (as well as that of z) approaches the normal 
type as n —» 00 . This may be established as follows. Using Stirling’s 
approximation on the coeflBcient Kn in (32), we obtain, after some 

* With Fisher’s permission and that of his publishers, Oliver and Boyd. 











139 


Small or Exact Sampling Theory 


algebraic simplification, the following expression: 


Kn = 


n - l Y»-»/8 / n - ly /^ 

,n — 2/ \n — 2/ 


/n - 1\>/* 

\ 2 j ’ 


From this it is easy to show that 


lim Kn = (2t)-‘/*. 

?l—► 00 

The rest of the t function may be written as 


(‘+«) 0 +«) 


which, when n = qo , becomes Therefore, 

lim Fn(t) = (2Tr)-i/2e-*^/2. 


The entries in the last line of Fisher^s table, corresponding to n = oo, 
are the deviations from the mean of a normal curve with unit standard 
deviation. 

According to Student,” the distribution of z tends to approach 
a normal curve with a standard deviation of (N — 3)'"^^^ for large 
values of N, Doming and Birge (loc, cit.) have suggested that the 
distribution tends to approach normality with {N — 3 / 2)*'^/2 
standard deviation. Anyhow, for large values of N, (N — 
would be approximately normally distributed about zero with unit 
standard deviation. Since 


(AT ~ 3)1/22 


16 ^:) 


1/2 


t and 


(x - x){N - 1 ) 1/2 


it is frequently satisfactory in applications to refer 


(34) 


(3e - jg)(Ar~ 3)1/2 

s 


to a normal probability scale when N > 30. 

For large values of N, (34) represents so small a refinement over 
(22) of Chapter VI that the additional computation seems unwar¬ 
ranted. So when N considerably exceeds 30 the older procedure of 
replacing o* by s and treating f = (x — x){Ny^^/s as though it were 
normally distributed with unit standard deviation is not appreciably 
erroneous. 
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10. Difference Between Two Means. Fisher^ demonstrated that 
(32) has a much wider range of application than the problem for 
which it was designed. He showed that the ^-distribution is appli¬ 
cable whenever we are dealing with a normally distributed variate 
whose standard deviation is not known exactly but is independently 
estimated from observations amounting to n degrees of freedom. 
The scheme by which the “ Student ” idea is made available to other 
problems consists in constructing a variable t in the nature of a frac¬ 
tion whose numerator is any statistic normally distributed and whose 
denominator is the square root of an independently distributed and 
unbiased estimate of the variance of the numerator involving n 
degrees of freedom. Thus the ^-distribution has been found useful 
in such problems as testing the significance of the difference be¬ 
tween two means and testing h 3 q)otheses regarding regression co¬ 
efficients. 

Let xi, X 2 be the means and Si, S 2 the standard deviations of two 
independent samples of Ni and variates, respectively, from a nor¬ 
mal universe with mean x and variance According to (10) of 
Chapter VI the variance of the difference between the two means is 

{Ni + N 2 )/NiN 2 - Then it can be proved that the variable 



is normally distributed with unit standard deviation. However, in 
most practical problems <r is unavailable and must be estimated from 
the samples. Using the unbiased estimate defined in (6), the above 
formula becomes 


(36) 



N,N2 


1/2 


N1 + N2 


Fisher showed that (36) is distributed in accord with (32) for 
n = iVi + -^2 — 2, and we can find from Fisher’s table of P the prob¬ 
ability of a greater difference between the means than that observed. 

As N\ and become large, {Nx + N^/{Nx + iV ’2 — 2) tends 
toward unity and (36) tends toward the value 


(37) 


t = 


Xx 


N2 


2 ^2 


N, 


irt 
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Since (36) is asymptotically normally distributed, the older procedure 
of referring (37) to a normal probability scale in testing a null hy¬ 
pothesis that two samples are from the same universe would not be 
invalid to any appreciable extent for large values of Ni and N 2 . 
The present writer^® has recently called attention to an erroneous 
formula which is commonly used in place of (37). 

If one of the samples, say ^^ 2 , is so much larger than the other that 
it tends toward the universe, then X 2 tends toward x and S 2 tends 
toward <r. So, under these conditions, (37) tends toward 

^ ^ jxi - x)y/Ni 
a 


which, if the subscripts are dropped, is the formula used in testing a 
null hypothesis that a given sample comes from a normal universe 
with a proposed mean. When Ni = N 2 = N, (36) reduces to 


(38) 


t 


(Xi - X2) 


N-1 

+ S2^l 


Inasmuch as we do not ordinarily know whether a sample is drawn 
from a normal universe or some other type of universe, a question 
quite naturally arises as to whether the procedure inaugurated by 
Student and extended by Fisher is applicable to small samples 
from non-normal universes. The question may be considered par¬ 
tially answered by Bartlett^® and others who have shown that it 
gives a good approximation for considerable departures from nor¬ 
mality in the sampled universe. However, a word of caution seems 
to be in order lest the new procedure be oversold in the applications 
by completely neglecting the underlying assumptions of normality 
in the universe and randomness of the samples. 

The following examples, cited by Rietz, illustrate the ‘‘ Student ” 
theory. 

Example 1. Suppose a random sample of = 5 is obtained from a hypo¬ 
thetical normal universe whose mean \e x — 2. It is found that 2 *= 3 and 
~ t for the sample. What is the probability that one would obtain a sample 
of five for which $ would differ numerically from x by as much as unity? 

Solidion. From (12), t = VS = 2.236. Entering Fisher’s table for n = 4, 
we find the probability P between .1 and .05. Referenee to the more extensive 
table in Metron^^ gives P = .0892 for the probability of a discrepancy as large 
as the one observed. It is interesting to compare this result with what would 
be obtained by reference to a normal probability scale. We find P « .0254 for 
a deviation outside t = db2.236. In terms of the odds that a mean, 2, wiU 
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deviate numerically more than 1 from theory, the contrast is more striking. 
Thus, under the Student ** theory we should say that the odds are 10,000 to 
892 or roughly 11 to 1 against a deviation as large as or larger than the one 
observed. Under the normal theory the odds are 10,000 to 254, or about 40 to 
1 against such a deviation. 

Example 2. The following data represents the yields in bushels of Indian 
com on ten subdivisions of equal areas of two agricultural plots in which Plot 1 
was a control plot treated the same as Plot 2, except for the amount of phos¬ 
phorus applied as a fertilizer. 


Plot 1 


Plot 2 

6.2 


5.6 

6.7 


5.9 

6.5 


5.6 

6.0 


5.7 

6.3 


5.8 

6.8 


5.7 

6.7 


6.0 

6.0 


5.5 

6.0 


5.7 

5.8 


5.5 

10 [60.0 

10 

\57.0 

*1 =. 6.0 

5^2 = 

= 5.7 


Is there a significant difference between the yields on the two plots, using the 
difference between their means as a criterion of judgment? 


SdiUion, 


.64 

a.«=^=.064 

.24 


Substitution in (38) gives 

f 9 

= (.3)(10.113) = 3.034. 

Entering “ Student’s ” tables in Metron (loc. cit,) at n = 18, we find P = .0072 
for the probability that t will fall outside the range —3.034 and 4-3.034. Hence 
a null hypothesis that the samples are from the same universe would be refuted 
by the test for both the .05 and .01 levels of significance. In other words, our 
conclusion is that, on the levels of significance adopted, there is a significant 
difference between the yields on the plots. 


11. Fisher’s ^-Distribution. Suppose w* and are two inde¬ 
pendent and unbiased estimates of the variance cr^ of a variable x 
which is normally distributed. If these estimates are based upon 
samples of N\ and N 2 f respectively, or upon ni and riz degrees of 
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freedom, then we have 

1 iVi 1 ni+1 

w* = - fi)* = — £ (a:i< - fi)* 

iVi — 1 1 1 

^ iVi n»+l 

= Tf-7]C(iC2,' — XiY = — (Xu — XiY 

JS 2 — I I 1fl2 I 

in which Xi and X 2 are the means of the two samples. In previous 
notation and would be denoted by and but these symbols 
are too unwieldy in the present discussion. 

In constructing a test of significance for the difference between two 
sample variances it might seem logical to form the difference 
w = and seek the distribution function of w. However, 

such a procedure is impractical because of the mathematical difficulty 
involved in determining this function. Fisher circumvented this 
difficulty by building a statistic, z, defined by 

(39) 2 = §(loge - loge V^) = loge^ 


whose distribution function, G{z), he obtained and which proved to 
have extremely wide application. To derive G{z) we make use of the 
distribution of H(s^) given in (25), replacing N — Ihy n and by 
{n/n + 1) (see § 3). After this modification, (25) becomes 

I n 

(40) - y. (m*) d(u*). 


Since m* and are independent their joint distribution is 
(41) K («*) (»l-S)/S(t)2) (n,-2)/2g-(»,«J+n.i-»)/2.» ^(j^2) (^(pS) 


where 


K = 


From (39) we have 
(42) 


(Wl)"'^^(W2)"»^^ 

2(ni+n2)/2(y(ni+ns)r I" 


and for a fixed value of v*, 

(43) d(u*) = 2»*e»* dz. 
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Using (42) and (43) in (41) we obtain 

(44) fiz 

for the joint distribution of and z. Integrating (44) with respect 
to between the limits 0 and oo and making use of the Gamma 
function we obtain the distribution of z, 


(46) 


G(z) = 





dz 

(me** + n 2 )^^^“^” 2)/2 


The function G(z) has the important property that it depends 
solely upon m and ^ 2 , not at all upon the variance of the sampled 
universe. Fisher's z should not be confused with the ^-distribution of 
“ Student.” 

The distribution function of z is extremely general, including as 
special cases, the x^-distribution, the ^-distribution of Student ” and 
Fisher, and the normal distribution. Rider has made easily avail¬ 
able the transformations and substitutions by which these special 
cases can be obtained from (45). 

The positive part of the curve for z = logc {u/v) is the same as the 
negative part for z = log« (v/u). Since it is optional which estimate is 
considered as it is necessary, in tabulating the probability integral 
of G(z), to consider only positive values of z by making w* the larger 
variance estimate (based on rii degrees of freedom). 

Let Q — J let P = 1 — Q. Thus P is the probability 

that z > Zo, In his book, Fisher has given values of Zq corresponding 
to the probabilities P = .05 and .01 for various combinations of 
ni and These values, Zoj are called the 5% and 1% points ” 
and are used as critical values in judging significance. It should be 
noticed that Fisher's ** points ” are based on the area of the whole 
curve and therefore they should not be confused with 5% and 1% 

levels of significance ” previously used. In the latter sense, 
Fisher's points ” would be 10% and 2% “ levels of significance.” 
In other words, a 5% point means a value of z such that one tail ” 
under the curve is .06, whereas a 5% level of significance meant a 
value of t such that the sum of both “ tails ” (outside ztt) is .05. 
It is hoped that tables of 5% and 1% levels of significance for z will 
sometime be available. 



145 


Small or Exact Sampling Theory 

12. Significance of Difference Between Variances. The usual 
hypothesis tested by the 2 -test is that and are estimates of one 
and the same population variance and therefore that z = 0. The 
significance of the divergence of the observed value of z from zero 
is the crux of the test. Small values of z mean a tenable hypothesis 
whereas values of z larger than Zq refute the hypothesis. If for 
P = .05 (or .01) the observed value of 2 , as computed from the 
samples in accordance with (39), is larger than 20 , the hypothesis 
is to be rejected and the conclusion is that the samples come from 
universes with different variances. 

Logically, the 2 -test should be applied before testing the difference 
between two means since the latter test depends on the equality of 
the population variances. 

To avoid the troublesome logarithmic computation involved in 
(39) Snedecor^o has published tables which transform Fisher^s 5% 
and 1% points into the ratio u^/v^j where Snedecor 

calls this ratio F in honor of Fisher.* Therefore, 



where is to be chosen the larger of the two given variance estimates. 
This table is reproduced in the Appendix. (See Table II.) 


Example 3. In Example 2 suppose we wish to test the assumption, which 
was made there, that the two samples come from universes with equal variance. 
We have 


V* 


=.0711 

ni 9 

?l2 ”1“ 1 « *24 t\e\npr 

-52* ~ ~ .0267 

712 9 


„ .0711 

F =- 

.0267 


= 2.663 


2 = .5 \ogeF 
= 1.1513 logioF = .49. 


Entering Pisher^s table (loc. cit.) for tii = 712 = 9 we find 2b = .58 for P = .05 
and 2 o = .84 for P = .01. This means that, if the true value of z were zero, 
random sampling fluctuations would be expected to give a value of 2 as great as 
.84, or greater, once in 100 trials, and a value of 2 as great as .58, or greater, five 


* In their new Statistical Tables Fisher and Yates call it the variance ratio. 
These tables are published by Oliver and Boyd, London. 
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times in 100 triab. The observed value of z is .49 and so this value might be 
accounted for by chance, at either the .05 or .01 points of significance. Using 
Snedecorb table we find F = 3.18 for P = .05 and F = 5.35 for P = .01. Since 
the observed value of F is only 2.663, we conclude that we were justified in 
proceeding with the f-test. 

When the samples are large there are two procedures available. 

I. G{z) is skew when Ui but when Ui = rii it is symmetrical. 
When Wi and are large and also for moderate values when they 
are equal or nearly equal one can verify (by taking logarithms) that 
2 is approximately normally distributed about zero with mean zero 
and variance i(l/ni + \/n^. Therefore, 


(46) 



may be referred to a normal probability scale. 

II. Let w = Si — 52. From (10) of Chapter VI and (29) of this 
chapter, 



Then 

(47) 


t = 


5i —• 52 



+ 


2nJ 


is normally distributed about zero with unit standard deviation. 
An estimate of the supposed common variance is given in (5). 
Using the square root of this estimate in place of <r in (47) and assum¬ 
ing that Ni and N 2 are large enough to regard, without appreciable 
error, the ratio (iVi + N 2 )/{Ni + iV '2 — 2) as unity we obtain 


(47a) 


~ 52 

2iV*'^2Mj 


This value may then be referred to a normal probability scale. 

An interesting derivation, using characteristic functions, of a 
method for testing the significance of the difference between two 
sample variances has recently been given by A. T. Craig.*‘ 
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13. Analysis of Variance. The test of significance between two 
independent sample variances (with their appropriate degrees of 
freedom) is a special case of a general technique, developed by Fisher, 
for segregating the variance into portions traceable to specific sources. 
In general, the kind of procedure one attempts to follow in such an 
analysis can be illustrated by the following scheme. 

Let us imagine a individuals 7i, h, • • • , /«, each subjected to 
b treatments Ti, , Ta. For example, the /'s may be agri¬ 

cultural plots containing different varieties of some plant and the 
Ts may be applications of various kinds or amounts of fertilizers. 
Or the Fs might conceivably be various diabetic patients and the 
Ts varietal insulin treatments. The effects of the Ts on the Fs 
yield a set of observations, to be denoted by Xjky which vary from 
one value of / to another for a fixed T and from one value of T to 
another for a fixed 7. Suppose, then, that N = ab independently 
observed values of a normally distributed variable are classified into 
a rows and b columns in accordance with some relevant scheme as 
depicted in Table 13. 


Tablb 13. Matrix of N ^ ab Independent Values 
FROM A Normal Universe 



Tl 

Tz 


n 

h 


Xi2, 


• • Xib 

Iz 

X2lt 

X22y 

* 

• • X2b 

la 

Xah 

Xa2y 

• 

• • Xab 


The values in each row will vary about the mean of that row and 
the values in each column will vary about the mean of that column. 
Let denote the mean of the jth row, 

b 

(48) J “ 1> 2, • * • , o, 

If* 1 

and let S.t denote the mean of the kth column, 

(49) aJ.i = k = 1,2, • • ‘ ,b. 

j-i 

(The dot indicates that summation has been effected on the index 
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which it replaces.) Let the mean of the entire set be Z where 

a b 

(50) abZ = 

1 1 

Let the variance in the entire set due to all causes be Q/ab where 

(51) Q = ““ 

11 1 

Now Q can be resolved into three quadratic forms as follows: 

(52) Q = 5i + 5^2 + 
where 

«i = - 5)* 

1 

b 

9s = a^ix.i — S)* 

1 

b 

93 = ■“ 

1 1 

That (52) is an identity in the N — ah values of x can be readily seen 
as follows: 

y^.l^X x4h — 2)* ~ ■" ^-k + ^) + (55,-. — S)+ (x,k — x)}* 

11 11 

= - Xi. - x,k + xY + 2Z^(x/. - xY 

1^1 1 1 

+ iLiiix.k - m 

1 1 

To show that the cross-product terms vanish consider the term 

a b 

- 35/. - Z.k + f)(S/. - f). 

1 1 

This becomes 

2(35/. — i)^(xik — S/. — f.* + 2P) 

« p(xf. - 35)(6x/. 62^/. -bT + bx) * 0. 

A similar demonstration can be made for the other cross-product 
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terms. This is left as an exercise for the student. . Since 

= bYiixj- - x)2 
1 1 1 

~ x)^ = a^{x.k - xy 
1 1 1 


(52) is established. 

The variability between rows is measured by qi and between columns 
by g 2 . The residual variability, freed from the influence of either 
rows or columns, is measured by and is called interaction (sometimes 
also discrepance). It may be regarded as the “experimental error” 
inherent in the experiment and over which no control is attempted. 
As will be shown later, it is used as a standard against which the 
variability measured by either qi or q 2 may be tested for significance, 
when the appropriate number of degrees of freedom are taken into 
account. 

From (51) the number of degrees of freedom in Q is seen to be 
iV ~ 1. Since there are a values of Xj. the number of degrees of 
freedom in qi is (a — 1). Similarly, the number in q 2 is (6 — 1 ). 
This leaves (iV — 1) — {(a — 1) + (6 — 1)} = (a — 1)(6 — 1 ) for 
53 , a result which may also be deduced from the expression for 53 . 
Another form of argument is as follows. The ab means of rows and 
columns form an (a X ?))-fold table of (a — 1)(6 -- 1) degrees of 
freedom since the marginal means are fixed in terms of the Xjk values. 
Anyhow, the number of degrees of freedom in interaction is the prod¬ 
uct of the numbers in the interacting forces. Accordingly, an un¬ 
biased estimate of <t^ from the rows is qi/{a — 1 ), from the columns 
is 52/(6 — 1 ), and from interaction is q^/ia — 1)(6 — 1 ). It is 
clear, therefore, that the ^-distribution can be employed to test the 
significance of the variability attributable to these sources if the 
independence of the above-mentioned estimates is assured. A. T. 
Craig 21 has settled this point by establishing the independence of 
the 5 's. 

The quantities required in an analysis of variance are summarized 
in Table 14. They can be readily computed except possibly 53 . So 
long as the arithmetic involved in computing the other quantities 
is carefully checked it is sufficient to evaluate 53 from relation (52). 
In other words, the sum of squares due to interaction may be found 
by subtracting (51 -f- 52 ) from the total sum of squares. 
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Table 14 


! 

Variance due to 

D. of F. 

Sum of Squares 

Unbiased 

Estimates 

Rows 

a - 1 

qi = b'tiSi- - 2)* 

1 

gi/(o,-l) 

Columns 

h - 1 

WBm 

52/(6 - 1) 

Interaction 

(a - 1)(6 - 1) 

9a = <? - ?i - 52 

5,/(a - 1)(6 - 1) 

Total 

db-l 

a b 

Q = EE(2,* - 2)' 
1 1 



Under the null hypothesis that there is no significant variation 
from row to row, the quantity 


(53) 


2 = i log. 


Q> - l)gi 


will be distributed in accord with (45) and the hypothesis can be 
tested from critical values of 2 or, more conveniently, perhaps, from 
Snedecor’s table by computing 


(54) 


(b - l)gi 
gs 


and entering the table at (ni, ni) where ni = 6 — 1, and nj = 
(a — 1) (6 — 1). If the computed value falls above the critical value 
adopted, the null hypothesis is rejected for that value. Similarly, 
to test the null hypothesis that there are no significant effects from 
coliunn to column we compute 


( 66 ) 


(g - l)gi 

9j 


and compare it with one of the tabular entries for ni = o — 1, nj = 
(a - 1)(5 - 1). 


ExampU 4. On a feeding experiment a farmer has four types of hogs denoted 
by I, II, III, IV. These types are each divided into three groups which are fed 
varietal rations A, B, and C. The following results are obtained, the numbers 
in the table being the gains in weight in pounds in the various groups. 
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I 

II 

III . 

IV 

Totals 

A 

7.0 

16.0 

10.5 

13.5 

47.0 

B 

14.0 

15.5 

1.5.0 

21.0 

65.5 

C 

8.5 

16.5 

9.5 

13.5 

48.0 

Totals 

29.5 

48.0 

35.0 

48.0 

160.5 


The computations yield the following results: 


Sum of Squares 

D. of F. 

Unbiased Estimates 

Rations 

54.1250 

2 

27.06 

Types 

87.7292 

3 

29.24 

Interaction 

28.2083 

6 

4.70 


To test the significance of the variation in rations we refer F = 27.06/4.70 = 5.76 
to Snedecor’s table where, corresponding to (2, 6) degrees of freedom, we find 
5.79 for the 5% point and 10.92 for the 1% point. Similarly, to test the sig¬ 
nificance of the variation between types w^e compute F = 29.24/4.70 = 6.2. The 
entries in the table for (3, 6) degrees of freedom are 4.76 for the 5% point 
and 9.78 for the 1% point. Our conclusion is that there is a significant differ¬ 
ence between breeds (somewhat doubtful) and between varieties of rations at 
the 5% point, but that neither is significant at the 1% point. 

14. Testing Variation in Sub-sets of Means. In a previous chap¬ 
ter a method was given for testing the significance of a difference 
between two means. We shall now show that the analysis of vari¬ 
ance technique lends itself to testing the significance of differences 
between any number of group means. 

Consider normal universes with means yxy (x = 1, 2, • • •, b), and 
variance <r^. Let samples of Nx be drawn one from each of these 
universes and let yx and Sx^ be the mean and variance of the sample 
of Nx- Thus we have b classes or arrays (as in a correlation table). 
The notation for the samples is summarized in Table 15. 
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Table 15 


Classes 

1 

2 

... X 

•• b 

Means 

5 ., 


• • • 5*1 

• • ffk 

Standard 

Deviations 




• • Sb 

Frequencies 

N,, 

Nt, 

• • • Nx, • 

*. Nb 


Our problem is to test, from the samples, the hypothesis that yi = y 2 
= • • • = §6. 

It can be shown (Cf. Part I, Ex. 3, p. 208) that the sum of the 

b 

squares of deviations of the N = ^Nx variates y* from the mean 

1 

y of the entire set may be broken up into two parts such that 


where 


V = V1 + V2 

V = Z(2/. - V)* 

1 

= 2iV»(y. - I/)* 
1 

Pi = 


It is conventional to call vi the variation between classes and V 2 the 
variation within classes, 

b 

An unbiased estimate of 5* is y where Ny = ^Nx yx- Hence there 

1 

are 6 — 1 degrees of freedom in vi. An unbiased estimate of from 
the values of is i;i/(6 — 1), and from the values of Sx^ is V 2 /{N — 6) 
since the variates in the computation of are subject to the linear 
* 

restriction 2 ^ 2 /* NxVx and there are 6 values of x. Therefore, 
1 

under the null hypothesis that yi = ^2 • • • = 56, the quantity 


z 


iiog. 


(N ~ b)vi 
(b - l)v. 


(56) 
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is distributed in accord with (45) and the hypothesis can be tested 
by computing 


(67) 


(N - b)vi 
(b - l)v. 


and comparing it with the entries in Snedecor’s table for (ni, n 2 ) 
where ni = 5 — 1, n 2 = N — b. The quantities required in the 
computations are summarized in Table 16. 


Table 16 


Variance due to 

D. of F. 

Sum of Squares 

Unbiased Estimates 

Between Classes 

b - 1 

Vi 

v/{b - 1) 

Within Classes 

B 

V2 

v/(N - h) 

Total 

■ 


{N - b)tH 

(6 - l)V2 


The variation within classes is independent of the principle of classifi¬ 
cation. Therefore, excessive variation between classes (variation of 
the yjs) as compared with variation within classes (variation of 
sample values about their respective means) will cause F to fall 
above the critical value adopted, and the null hypothesis is contra¬ 
dicted or refuted for that value. 

Examples from agricultural and certain branches of biological 
science will be found in the textbooks by Fisher and by Snedecor, 
and from the field of economics in Mills' text (revised edition). 

15. Testing Linear Regression. Consider a correlation table with 
b arrays in the x direction. Let f(x) represent the frequency and yx 
the mean in the array at x. Let (2,. y) be the mean of the table and 
mi and m 2 the linear regression coefficients as defined in Part I. 
Suppose the N = 2/) entries in the table constitute a sample 

from a normal bivariate universe and we wish to test the hypothesis, 
ff, that the regression of 2 / on a; is linear. It is shown in Part I that 
Fx — 5 = mi(x - J) is the equation of the line which fits the means 
of the arrays best, in a least-squares sense, and so Fx is the estimated 
mean of the array at x. (A slightly different notation was used in 
Part I.) 
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The variation B hetwein arrays can be resolved into two com¬ 
ponents Bi and B 2 such that 

(58) B = Bi B 2 

h 

where B = - vY 

1 

1 

B2 = - y)\ 

1 

To establish (58) we may write B in the form 

i:/(x){(y, - y.) + (y. -'y)}* 

1 

which upon expansion equals Bi + B^ because, as the student may 
verify, the cross-product term vanishes. 

It is shown in Part I that B = (Cf. (39), p. 200) and 

Bi = (Cf. (16), p. 172). Since Bi is the part of B which is 

accounted for by H it follows from (58) that Bi = Nay^ (r^yx^ — r^) 
is the part of B not accounted for by H. We are interested in the 
question, Is Bi excessive compared with the random sampling fluctua¬ 
tions to be expected under the null hypothesis? To answer this 
question consider the variation W within exrays where 

W = 'Zjmy - v,y. 

z 

In Part I this was designated by NS'y* which in turn is equal 
to Nay\l — This variation within classes is due to a host 

of random forces which are not dependent on the value of x de¬ 
fining the arrays. Therefore, W provides a basis for testing 
whether Bi is small enough to be accepted as the resultant of random 
forc^ undw H or whether it is so large as to contradict H. Before 
we ean use riie Meet, however, the degrees of freedom must be 

r^on^. In B there are 6 — 1 degrees of freedom because the h 

6 

values of 5* subject to the linear restriction 

The number in Bi may be determined by making use of the regression 
equation and writing Bi in the form 

- P) = mi* jymx - S)*. 
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* > 

6 . 

Since y^.f(z)(x — is independent of the regression, the variation 

a;-l 

in B 2 must be due to the single statistic mi and therefore involve one 
degree of freedom. Hence, from (58), there are 6 — 2 degrees of 
freedom in Bi. Since there are b arrays there are N — b degrees 
of freedom in W, Consequently, 

* 11 N -- b 

is distributed in accord with (45) if is true. The computed value of 

112 - r* iV - b 
“ 1-11* b - 2 

may, therefore, be compared with one ©f the entries in Snedecor’s 
table forni = 5 — 2, n 2 = iV — 6. 

This is the test which was promised in Part I to replace the Blake- 
man criterion which Fisher proved was unsound. The student may 
construct a similar argument for testing an hypothesis of linear 
regression of x on y, 

16. Tests of Significance of r. Let the variables Xy y be simultane¬ 
ously distributed in accord with some one or other of the distribution 
functions 

f(x, y) = Ke-^y ' —00 ^ X ^ oo,--oo oo, 

where 

^ = 27ror*<ry(l - 

p 1 \ {x-xy 2p(x - x)(2/ -y) (y - yY 

and X, y, cr*, (r^, and p are undetermined. In other words, suppose that 
the universe is some normal bivariate distribution. The question of 
the reliability of a value of r computed from a sample of N pairs of 
{Xy y) from such a universe may conveniently be discussed under two 
cases. 

Case /. When p = 0. In testing the significance of an observed 
value of r we are testing the hypothesis that p = 0. Under this 
hypothesis the sampling distribution of r is known to be 

/(r) = fc(l - -1 r ^ 1, 
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where 1/A: = B(i, N — 1/2). The curves represented by this func¬ 
tion are synametrical about r = 0 with 

(T, = (iV - l)-‘/* 


As N becomes large the function is practically normal and conse¬ 
quently 

( 69 ) t = r(N- 1 )>« 

tends to be normally distributed with mean zero and unit standard 
deviation. Therefore, to test the significance of a value of r com¬ 
puted from a large sample it would not be invalid, to any appreciable 
extent, to refer (59) to a normal probability scale. 

When N is small the problem may be resolved into an analysis of 
variance. In a correlation table, the total variation in the y direction 
may be broken up into two parts, (1) the part which may be 

accounted for by an hypothesis of linear regression and (2) the residual 
part NSy* — — r*). If there is no real correlation between 

the two variables then parts (1) and (2) are estimates of the same 
universe variance. Now to apply the z-test we must have unbiased 
estimates. There is one degree of freedom in part (1) and W — 2 in 


Table 17 


Variation 

D.ofF. 

Regression line 

IL(y - Y.yf(x) - ATrV,* 

X 

1 

Residuals 

T.(Y. - mix) = mi - r*w 

X 

N -2 

Totals 

E(v- smx) - jvt,* 

X 

N - 1 


part (2). Consequently we may test the independence of y and x by 
computing 


(60) 


* = i log. 


r^jN - 2 ) 
1 - r» 


and seeing if it lies beyond the 5% or 1% points in the table for 
ni = 1, ns = N — 2. However, it is conventional to make use of 
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Fisher's ^-distribution. It can be shown (see Problem 10) that the 
distribution of < is a special case of that for z when ni = 1, n 2 = n, 
and 2 = i log, Therefore, 

11/2 

(61) t “ 


\N-2] 


is distributed in accord with F„(0 for n = iV — 2. In § 11 we ob¬ 
served that the 0.05 level of significance for z is the .025 point How¬ 
ever, when used as an alternative to t, the 0.05 point of z is also the 
0.05 level because the whole distribution of t is equivalent to the 
positive half of the 2 -distribution in the sense that, for tests of signifi¬ 
cance, 2 ranges from 0 to « whereas t ranges from — oo to «>. 

Tables are available (Fisher's text, Table V.A.) for applying this 
test directly from r, giving values of r on four levels of significance 
represented by P = .10, .05, .02, and .01, for various values of n. 
It might prove interesting to compare an entry in this table with the 
corresponding entries in the z and t tables. For example, when 
n = 18 (iV == 20) we find from this table that r = .4438 lies on the 
P = .05 level, and making the transformation to z by (60) we obtain 
2 = .7424 which agrees exactly with the entry in the 2 -table at the 
.05 point when rii = 1, n 2 = 18. Finally, when r = .4438 in (61) 
t = 2.101 which is the entry in the ^-table at the .05 level. 

Case II. When p 0. If the samples are large {N > 100) and 
if p is small or only moderately large (|p| < .6 perhaps) then it is 
true that r is approximately normally distributed about the value p 
with standard deviation of 

<rr = (1 ~ t^){N - 

It is customary, under these conditions, to attach to an observed 
value of r a standard error of 

= (1 - r^){N - l)“i'2 

and, for a proposed p, to refer the computed value of 

r - P 


i = 




to a normal probability scale. 

This procedure is invalid, however, if N is small and p is large. 
The distribution of r from small samples is skew and the skewness 
increases with p. This may be understood intuitively by considering 
the distribution of r's from a universe in which p is .9. The range of 
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possible variation of r above p is only .1. But the possible range 
below p is 1.9. Accordingly the sampling distribution of r (AT small) 
from this universe will be sharply skew. An extensive cooperative 
study of the distribution of r was made in 1917 by Soper and others 



-1.0 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 1.0 

Value of r Observed 



-2^ -2.0 -1.5 -1.0 -.5 0 .5 1.0 1.6 2.0 2.5 3.0 

Value of TL Observed 
Fiq. 22 


They succeeded in finding expressions for its moments and on this 
basis represented the distribution; for various values of N and p, by 
Pearson curves. They also gave an elaborate set of tables of ordi¬ 
nates for values of p from 0 to 1 by increments of .1 and for values 
of r from —1 to +1 by increments of .05. The upper panel of 
Figure 22 (from Fisher’s book) shows the r curves for two values 
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of p with N — 8, which (presumably) were drawn from the ordinates 
of these tables. They indicate the rapid departure from normality 
that may be expected for small samples as p approaches high values. 

In his study of the sampling distribution of the correlation coeffi¬ 
cient Fisher found that it was not desirable to use r as the independent 
variable and he introduced a transformation which has distinctive 
merits. He showed that the quantity* 

(62) z' = § 

is approximately normally distributed and is nearly constant in form 
as p changes. Its mode is always close to p. The lower panel of 
Figure 22 shows the distribution curves for z' corresponding to the r 
curves in the upper panel. The standard deviation is 

(63) = (iV - 

and is practically independent of p. The transformation is applicable 
in the following tests (among others). 

(а) To test if an observed value of r differs significantly from a 
proposed theoretical value, p. 

(б) To test if two observed values are significantly different. 

The procedure for (a) is to calculatef 

(64) t = (z' - z")(iV - 

and refer the result to a normal probability scale. For (6) the pro¬ 
cedure is to find, in accordance with (62), the two values of z', say 
z\ and z' 2 , corresponding to the two observed values of r, say ri and r 2 
from samples of Ni and iV' 2 , respectively. Then compute d = z'l — z '2 
and (Td = - 3) + 1 /{N 2 ~ 3 )}i^ 2 ^ndrefer 



(Td 


to a normal probability scale. 

For numerical examples the student is referred to Fisher’s book, 
§§ 33-35. Tables are also available there to facilitate the com¬ 
putation of 2 ' for an assigned r. One should observe that the z' 
technique is not applicable to the case of simple tests of significance 
(p = 0). In that case Fisher’s table of t is available. 

* This quantity is not quite the same as the z used for the ratio of two vari¬ 
ances and so wo use a prime here to distinguish between them. 

t In (64), 2 " is the value of (62) when r is rejdaced by p. 
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Three final remarks seem appropriate. (1) In computing an r 
to be tested it is not desirable to apply Sheppard^s corrections to 
8x and 8y because they tend to increase the value of r. This also 
applies in testing for linear regression (§ 15). (2) It has been shown 

that the z' procedure is applicable in testing the significance of partial 
correlation coefficients if iV" in <Tt', is replaced hy N — k where k is the 
number of secondary subscripts in the coefficient. (3) All of the 
above procedures are strictly valid only for normal universes. How¬ 
ever, there is considerable experimental evidence to indicate that 
they hold for all practical purposes provided the marginal distribu¬ 
tions of one or both variables in the universe are not of the J- or U- 
shaped types. Of course, in those extreme cases one would naturally 
hesitate to use r as a measure of association. 
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Problems 

1. Derive the expression for the expected value of s* in repeated samples of N 

independent observations from an arbitrary universe. Explain the use 
of this expression in estimating the variance of a universe. 

2. In a certain observed distribution, iV' = 20, = 42, s = 5. Test the hypoth¬ 

esis that this distribution is a random sample from a normal universe 
with mean of 50. 

3. In a certain test, one section of 20 students had an average score of 40 with 

a standard deviation of 5. Another section of 25 had an average of 46 
with standard deviation of 4. Does this indicate a significant difference 
in the two groups? What assumptions do you make in applying the test? 
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4. In an experiment in industrial psychology a job was performed by one 
group of 30 workmen according to Method I and by a second group of 40 
according to Method II. (The groups were independent and equally 
efficient.) Are the following distributions of the time (in seconds) taken 
such as to justify the conclusion that Method I is the speedier of the two? 
Use the difference between the means as a criterion of judgment. 


Time 

I 

II 

50 

1 

0 

51 

3 

1 

52 

5 

2 

53 

4 

5 

54 

7 

8 

55 

5 

9 

56 

3 

6 

57 

1 

3 

58 

1 

3 

59 

0 

1 

60 

0 

2 

Totals 

30 

40 


5. From the separate distribution functions of 2 and s derive the distribution 

of ** Student's " and from that obtain the function Fn(0* 

6. Prove that Fn(t) is asymptotically normally distributed. 

7. Derive Fisher’s ^-distribution, G{z), 

8 . (Mitts’ textf revised.) Manufacturing industries were classified into those 

producing perishable, semi-durable, and durable goods. An average of 
changes occurring between 1929 and 1933 in the selling prices of the prod¬ 
ucts of each of these categories was computed giving the index numbers 
shown in the yx colunm of the following table. 


Class of industryj 

X 

Number of 
industries^ Nx 

Means^ 

Vz 

Compuiations 

Producing perish¬ 
able goods 

34 

69.81 

6-1=2, i\r--6=82 

Producing semi-du¬ 
rable goods 

26 

66.41 

vi = 2,161.8800 

Producing durable 
goods 

25 

78.96 

V2 = 15,564.9040 

All industries 

85 


V = 17,726.7840 


Compute F and test the null hypothesis that there was no real difference 
in the price movements of the three different classes of industry for the 
years 1929-1933. 
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9. Prove that £ - j/)* « .(gi - y»)K 

*-l Ni + Ni 

10* Prove that the test for signihcance between two means is a special case of 
the test for significant variation in sub-sets of means by showing that (56) 
of § 14 reduces, when 5 = 2, to 

gfi - g2 f NiNi 
^ \Ni+N2i 

where d is an unbiased estimate of <t and i.is distributed in accord with 
Fn(t) for n = iVi + ^^2 — 2. 

The following three problems are from Fisher^s book. 

11m For the twenty years 1885-1904, the mean wheat yield of Eastern England 
was foimd to be correlated with the autumn rainfall; the correlation was 
found to be — .629. Is this value significant? 

12. In a sample of — 25 pairs of parent and child the correlation in a certain 
character was found to be .60. Is this value consistent with the view 
that the true correlation in that character was .46? 

18. Of two samples the first, of 20 pairs, gives a correlation of .6, the second, of 
25 pairs, gives a correlation .8. Are these values significantly different? 



CHAPTER VIII 

A. THE X* DISTRIBUTION AND APPLICATIONS 


1. The Multinomial Law.^ The general term of the multinomial 
expansion for k mutually exclusive categories sets the stage for a 
presentation of which provides an insight into the probability 
theory of this important quantity and its usefulness in the testing of 
hypotheses. So we begin with a preliminary treatment of the multi¬ 
nomial law. 

Consider an event that is characterized by a variable v which can 
take on one of k values, wi, t; 2 , * • • Vk- Let the probability that Vi 

k 

occurs be where = 1* Then in N independent trials, the 
1 

probability that Vi occurs mi times, occurs m 2 times, and so on, 
in a specified order (whatever it may be) is 

p^^p^^ • • • p*”** 
k 

where ^nti = iV, the m's being positive integers or zero. The num- 
1 

ber of ways in which the order can be specified is the number of 
permutations possible among N objects of which mi are of type Ti, 
m 2 of type 72, • • • rrik of type 7*. Let this number be denoted by 
p[m<]. Then we have 

pW = ^ 

mi I m2 ! • • • m* i 

Tbe^fore, the probability that mi of the variates take the value vi, 
m» the value Vj, and so on, regardless of order is 

(1) /(mi, mj, • • • m*) = p[m<]pi"*>ps"*» • • • p*“» 

which is the general term of the expansion of the multinomial 

(pi + p» +-h PkV- 

The law of repeated trials, for a simple dichotomy, given in Chap¬ 
ter 1, is a special case of this law. Thus if A; = 2, the right member 
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of (1) reduces to 

(2) C(N,r)p-q^-^ 

where 

r == mi, iV — r = m 2 , p - pi, ^ = 1 — Pi = P 2 , C(Ny r) =^N \ /mi I m 2 !. 

If V is the number of spots appearing on the top face in a throw of a 
die, then v will take on one of the values 1, 2, 3, 4, 5, 6, and the prob¬ 
ability of throwing exactly r aces (say) in N throws of the die is 

C(Ny 

We recall that (2) is the general term of the expansion of the 
binomial (g + p)^. By using Stirling’s approximation for factorials, 
we can derive an approximation for (1) which will bear to the multi¬ 
nomial law a relation analogous to that which the normal curve bears 
to the binomial. With this objective in mind, assume that every m< 
is sufficiently large for m^ ! to be replaced by its Stirling approxima¬ 
tion. Making these replacements (1) becomes, after some algebraic 
rearrangement, 

k 

(3) /(mi, Mi, ■■■ mk) = (2,riV)<*-‘>/»(piP2 • • • p*)*'* ’ 

Next introduce the transformation 


( 4 ) 


ti = 


nii — Npi 
<ri 


<r^ being iVp<(l — p<)- Under this transformation (3) becomes 

(2tN)‘*-'’'*(pip* • • • = .n (i + 

Then 

log L.M. = X {-Npi - ffiU - i) log ^1 + 

where L.M. denotes the left-hand member of the preceding equation. 
Upon expanding the logarithm in a power series and collecting the 
results according to descending powers of iV, we obtain 

= — ^<riU + + terms of lower order^ . 

1 \ 2Npi } / 


log L.M. 
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Let each in = iV be transformed in accordance with (4). 
The result is 

= N, 

1 1 

whence it follows that = 0 since ^Pi = 1. Therefore, 

remembering the value of (r<2, (3) may be written 

/(wi, m2, • • • mjfc) = 

The form of the exponent of e suggests the substitution of a new 
variable Xi = <»(1 — piY^^ in place of U, Upon making this substitution 
we have 

(5) /(mi, m 2 , • • • mjfc) = (piP 2 • * * 

where Xi = (m* — Npi){Npi)^^^^ and Npi is the mean or expected 
value of nii. 

Now, following Wilks,^ the x^s are independent except for the single 

k 

linear restriction ^(NpiY^^Xi = 0. Let R be the region in the x- 
1 

space subject to the linear restriction just given corresponding to any 
region Rn in the rw-space. Since the m^s are always integers, the 
change in Xi corresponding to a change of unity in m* is {Npi)^^^^ = 
AXt. Treating A; — 1 of the x’s, say Xij X 2 , • • • Xk^i as the independent 
variables, and using an extension of the fundamental theorem on the 
existence of a definite integral (Riemann), we have 

lim ^ 2 , • • • Xk) AxiAx 2 • • • AXk-i = 

R, 

( 6 ) ( 2 x )(*-‘>/ 2 ( p *)*/2 

where for a given N, ^ denotes the summation over all points in the 

« R 

region R corresponding to those in Rm for which /(mi, m 2 , • • • m*) 
is defined. The integral is ^-dimensional, and dx = dxi dx 2 • • • dxk-i- 
2. The X* Distribution. The quantity 

(7) i:*i* = X* 

1 

is used as an index of the extent to which the set of m’s taken as a 
whole cluster about their respective expected values. Later on we 
will explain the practical import of this index. For the present we 
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confine our attention to the purely mathematical problem of finding 
the distribution function of First, we consider the problem of 
finding the distribution function of x- To this end we observe that, 
corresponding to different values of x, (7) defines a set of fc-dimen- 
sional hyperspheres all having their centers at the origin of the Xt-axes 
and no two intersecting. Now we can obtain the distribution of x 
by determining the value of the integral in (6) when R consists of the 
region bounded by the concentric hyperspheres 

(8) = X* and = (x + dx)* 

1 1 

subject to the condition that 

(9) ’ZiNpiY'^Xi = 0. 

1 

Since this last equation is a hyperplane through the common center 
of the hyperspheres, the region R is therefore a “ shell ” of a fc — 1 
hypersphere. Within this shell 

to within terms of order dx- 

Now it can be shown that the volume V of an s-dimensional hyper¬ 
sphere of radius r is 

V - Cr* 


where C is independent of r. The volume between two concentric 
hyperspheres of radii r and r + dr is therefore approximately 

(10) dV = dr. 


Returning to the x problem, it is clear from (10) that if the region 
bounded by the hyperspheres in (8), subject to the restriction given 
by (9), is chosen as the element of volume, then the probability that 
■ 1/2 

will lie in the interval from x to x + is 
df = x"'-' dx. 


A 

( 11 ) 


Here K is independent of x and can be determined by the condition 



Using the Gamma function, we find 


K 


2 (*- s )/2 
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The distribution of is thus given by 

(12) d(x^) = ^ rf(x^). 

2(*~i)/2r 

The number fc — 1 is the number of degrees of freedom which is the 
number of x^8 which are independent in (6). 

3. Tables. The probability of obtaining a sample of a;’s for which 
is greater than an assigned say xo^, is given by 

(13) p(x^ > xo^) = d(r). 

The symbol on the left in (13) may be abbreviated to P when there is 
no ambiguity. It is obvious that x^ is never negative and may vary 
from 0 (when there is no difference between the observed and ex¬ 
pected frequencies) to very large values. As x^ increases from 0 to 
oo, the probability P given by (13) decreases from 1 to 0. The stu¬ 
dent will recognize Tk-i (xO as a Pearson Type III curve and the 
integral in (13) as essentially an incomplete Gamma function. Values 
of P can be found in Pearson's Tables^ and we have included in the 
Appendix (see Table III) a short table, from Fisher's book,^ giving 
values of x^ corresponding to specially selected values of P, In our 
table, n = fc — 1. 

For fairly large values of fc, (2x^)^^^ is approximately normally 
distributed about a mean (2k — 1)^^^ with unit standard deviation. 
Therefore, one may refer 

t = (2x^y'^ - (2k - 1)1/2 

to a normal probability scale when k > 30. 

4. Applications. The x^-test was designed by its originator, 
Karl Pearson,® as a criterion for testing hypotheses about frequency 
distributions. These hypotheses may be classified into two types 
which we will call simple and composite. We are making an explicit 
distinction between them and considering them separately to avoid 
certain misunderstandings which have sometimes occurred, in the 
past, in the applications of the test. To be more specific, there has 
been, as a result of confounding hypotheses to be tested, some contro¬ 
versy over the appropriate number of degrees of freedom to use in 
entering the tables for P(x* > xo*). 



169 


The Distribution and Applications 

Simple Hypothesis, Under this heading we will consider those 
cases in which the theoretical frequencies are known a priori^ that is, 
when they are not inferred in any way from the sample. 

Suppose that we have a set of k observed frequencies 

mi “ 1 “ m^ 4 " * * * 4 ” mk = N 

constituting a sample from a hypothetical universe (supposedly 
infinite) in which the relative frequencies in the k categories are 
known to be pi, p 2 , * * *, p*, respectively, where p» = fhi/N, Then, 
corresponding to the observed frequencies, we have a set of k theoreti¬ 
cal frequencies such that 

mi 4 “ m2 4 " * • * 4 * mk = N, 

An example would be, for the m’s, the frequency of heads obtained in 
tossing N coins k times, and, for the m^s, the corresponding theoretical 
frequencies given by the terms in the expansion of the binomial 
-^(^4- i)** In comparing the observed and theoretical frequencies 
a question quite naturally arises as to whether the aggregate discrep¬ 
ancy between them could be explained on the basis of chance 
fluctuations under the hypothesis that J is the probability of success 
in each trial. More generally, we are interested in such a question 
as the following. On the hypothesis that an observed distribution is 
a random sample from a proposed universe, what is the probability 
that, taken as a whole, the discrepancy between theory and observa¬ 
tion would yield a value of as large as, or larger than, the value 
obtained. The hypothesis is to be rejected whenever the probability 
is considered small.’' 

If we let Xi = (m< — mi)/V^ it is clear that the x’s are subject to 
the linear homogeneous restriction given by (8) with n = Jb — 1 
degrees of freedom because, if A; — 1 of the x’s are fixed, the fcth is 
determined. In the case of a simple hypothesis, then, Fisher’s table 
of P is to be entered with n = A — 1. 

With regard to levels of significance, Fisher^ says: 

In preparing this table we have borne in mind that in practice we do not want 
to know the exact value of P for any observed but, in the first place, whether 
or not the observed value is open to suspicion. If P is between .1 and .9 there 
is certainly no reason to suspect the hypothesis tested. If it is below .02 it is 
strongly indicated that the hypothesis fails to account for the whole of the facts. 
We shall not often be astray if we draw a conventional line at .05, and consider 
that higher values of indicate a real discrepancy. 
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Composite Hypothesis. In the majority of practical cases, the 
frequencies are not known a priori and must be estimated from the 
sample. Thus, in a graduation by means of the normal curve the 
theoretical frequencies are obtained by imposing the conditions that 
the universe has the same mean and standard deviation as the sample. 
The x^-test can be accurately applied only if allowance is made for 
the number of parameters which are determined from the sample in 
reconstructing the universe. Suppose there are q parameters in the 
function representing the universe and these are to be determined 
from the sample by the principle of moments. Since any moment is 
a linear function of the frequencies (it will be remembered that the 
frequencies are the variables in this discussion), the determination 
of the q parameters involves q linear restrictions. We have seen in 
§ 2 that the restriction imposed by (9) reduced our problem from a 
space of k dimensions to a space of — 1 dimensions. Quite analo¬ 
gously, q additional linear restrictions reduce the space to fc — 1 — ^ 
dimensions. Accordingly,* in testing divergence from a universe 
specified by a function/(v, a, b, c, • • •) where v is the variable of the 
distribution and a, b, c, • • • are q disposable parameters which are 
to be estimated from the sample, the number of degrees of freedom 
with which to enter the tables of P is w = A — 1 ~ g. 

The following two conditions should be fulfilled in applying the 
xHest (for both simple and composite hypotheses). 

1. No class should contain very few items because, in the deriva¬ 
tion of (11), it was assumed that mi was sufiBiciently large to replace 

I by its Stirling approximation. 

2. The number of classes should not be very large since it can be 
shown, by expanding the integrand in (13) into a power series, that 
P —> 1 as fc 00 . 

We shall interpret, somewhat arbitrarily, these conditions to mean 
that P cannot be guaranteed when m < 5 and k > 20. To satisfy 
the first condition, it is customary to lump together the small fre¬ 
quencies at the ends of the distribution. 

Example 1. Twelve dice were thrown 4096 times; only a throw of six was 
counted a success. The expected frequencies are given by 4096(i -f |).^* 
How improbable, taken as a whole, is the observed distribution shown in 
Table 18? 

* Strictly speaking, the determination of the parameters by the method of 
moments does not lead to a system of equations which are exactly analogous 
to (9^. 
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Table 18 


Number of 
Stuxesses 

Observed 

Frequency 

Theoretical 

Frequency 

(m — m)2 

{m — m)* 

m 

0 

447 

459 

144 

.3137 

1 

1145 

1103 

1764 

1.5993 

2 

1181 

1213 

1024 

.8442 

3 

796 

809 

169 

.2089 

4 

380 

364 

256 

.7033 

5 

115 

116 

1 

.0086 

6 

24 

27 

9 

.3333 

7 and over 

8 

5 

9 

1.8000 

Totals 

4096 

4096 


= 5.8113 


Entering Table III (see Appendix) with n = 8 — 1 — 7, and interpolating for 
the value of P corresponding to the observed value of x* = 5.8113, we find 
P = .56. Hence there is no reason to reject the hypothesis that the underlying 
chance of a success ’’ is p = i. That is, there is no reason to suspect that the 
dice were biased. 

Example 2. An observed distribution was graduated by means of the normal 
curve (see Part I, p. 123) with the results shown in Table 19. Test the hypoth¬ 
esis that the observed distribution was a sample from a normal universe with 
mean and standard deviation equal respectively to those of the sample. 


Table 19 


Central 

Values 

Observed 

Frequency 

Theoretical 

Frequency 


1 

2.5 



i"ni4.7 

37.5 

56 

60.2 

41.5 

172 

155.4 

45.5 

245 

252.6 

49.5 

263 

258.8 

53.5 

156 

167.2 

57.5 

67 

68.0 

61.5 

26(2" 

zo.of'If 

65.5 

1 3 

1 3.1 

Totals 

1000 

1000.0 


It is found that x* = 4.82. After pooling the end frequencies, as shown, k — S. 
So entering Table III for n = 8 — 1 — 2 = 5, we find that P > .4. Hence the 
x*-test does not reject the hypothesis. 

For applications of the x*-test to contingency tables the reader is referred to 
Fisher's l^ok. 
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B. STATISTICAL IltFERBNCE 

5. Induction versus Deduction. To contrast the inductive prob¬ 
lems, which we are about to consider, with deductive problems, we 
shall review briefly a deductive type of argument which we have 
previously discussed. Suppose D{t) is the distribution function of 
a statistic t computed from a sample from a universe specified with 

respect to functional form and parameters. Then / D(t) dt gives 

»/— 00 

the probability that an observed value of t will not exceed an assigned 
value of 8, Thus in Chapter VI we learned that the means of 
samples cluster about the mean of the universe, and Theorem X 
of that chapter gave us the probability that a sample mean would 
have a numerical value within 8 of the mean of the universe. This 
is a deductive argument. Presently we shall consider certain inverse 
problems which arise in arguing from samples and their statistics 
back to universes and their parameters. First, however, we shall 
examine Bayes’ Theorem. The following quotation from R. A. 
Fisher® will serve as a setting for our consideration of this theorem. 

Thomas Bayes' paper of 1763 was the first attempt known to us to rationalize 
the process of inductive r^oning. From time immemorial, of course, men had 
reasoned inductively; sometimes, no doubt, well, and sometimes badly, but 
the imcertainty of all such inferences from the particular to the general had 
seemed to cast a logical doubt on the whole process. By the middle of the 
eighteenth century, however, experimental science had taken its first strides, and 
all the learned world was conscious of the effort to enlarge knowledge by experi¬ 
ment, or by carefully planned observation. To such an age the limitations of a 
purely deductive logic were intolerable. Yet it seemed that mathematicians 
were willing to admit the cogency only of purely deductive reasoning. From 
an exact hypothesis, well defined in every detail, they were prepared to reason 
with precision as to its various particiUar consequences. But, faced with a 
finite, though representative, sample of observations, they could make no rigor¬ 
ous statements about the population from which the sample had been drawn. 

Bayes perceived the fundamental importance of this problem and framed an 
axiom, which, if its truth were granted, would suffice to bring this large class of 
inductive inferences within the domain of the theory of probability; so that, 
\ after a sample had been observed, statements about the population could be 
'made, uncertain inferences, indeed, but having the well-defined type of un¬ 
certainty characteristic of statements of probability. Bayes* technique in this 
feat is ingenious. His predecessors had supplied adequate methods, given a 
well-defined population, for stating the probability that any particular type of 
population might have given rise to it. He imagines, in effect, that the possible 
types of population have themselves been drawn, as samples, from a super¬ 
population, and his axiom defines this super-population with exactitude. His 
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problem thus becomes a purely deductive one to which familiar methods were 
applicable. 

6. Bayes’ Theorem. To derive Bayes’ theorem, consider a bi¬ 
variate universe of discrete variables in which x takes the values 
Xx, X 2 , • • • , rCn, and y the values i/i, 2 / 2 , • • • , i/m. Let P{xi, yi) rep¬ 
resent the probability for the joint occurrence of (x*, t/y). Let 
PiVi 1 be the probability that y takes the value 2 // when it is 
known that x has taken the value x*. Then 

(14) = 

Q\^i) 

m 

where g{xi) = yi) is the marginal distribution of x in the 

bivariate universe and represents the a priori probability that x takes 
the value x». Let us write (14) in the form 

(15) P(xi, y,) = g(xi)P(yi [ Xi). 

By a similar argument we may write 

(16) P{xi, yi) = h(yi)P{Xi | y,), 


n 

where h(yj) = ^P{xi, yi) is the marginal distribution of y, and 

t=i 

P(^t I yi) Is the probability that x = Xi when it is known that y = y,-. 
It is clear from proceeding relations that 


(17) 


Kyi = 9kxi)P{yi I ^i)- 




Since P(x<, yi) means exactly the same thing in (15) and (16) we may 
equate their right members and solve for P{xi ] 2 /y). The result is 


(18) 


PiXiUi) 


g(Xi)P{yi 13c.) 

Hyi) 


This is Bayes’ theorem and it may be stated as follows. 

Bayes’ Theorem. The probability that x = Xi when y = y^ is equal 
to the product of the probabilities that x = Xi, and that y — yj when 
X = Xiy divided by the probability that y = y^ 

The theorem is usually expressed symbolically in the somewhat 
different form to which it reduces when (17) is substituted for the 
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denominator of (18). This form is 


(19) 


Zp(*.)^(»j I Xi ) 
»*1 


To connect Bayes^ theorem with a posteriori^ or inverse, proba¬ 
bility suppose in (19) that the x^s denote certain initial situations and 
the 2 /’s denote events subsequently observed. The a priori proba¬ 
bility for the existence (occurrence) of the initial situation char¬ 
acterized by Xi is Sf(x<). P{yj | x<) is the a priori probability that yj 
will occur when Xi exists. Then (19) gives the a posteriori proba¬ 
bility that the ith initial situation has produced the observed event 
specified by y,-. 

The following examples will clarify the theorem and serve to focus 
attention on its weakness. The first example, a somewhat artificial 
one, is designed to illustrate a situation where the existence proba¬ 
bilities g{x^ are equal. The second will describe a situation when 
nothing is known about them. 


Example 3. {Molina’^) During his sophomore year Tom Smith played on 
both the baseball and football teams; we have been informed that he broke his 
ankle in one of the games; what are the a posteriori probabilities in favor of 
baseball and football, respectively, as the baneful cause of the accident? Evi¬ 
dently the answer depends on the number of baseball and football games played 
during their respective seasons and also on the likelihood of a man breaking an 
ankle in one or the other of these two sports. As a concrete case assume that: 

(a) At Smith’s college an equal number of baseball and football games are 
played per season; 

(b) Statistical records indicate that if a student participates in a baseball 
game the probability is tSu that he will break an ankle and that, likewise, the 
probability is yilrw for the same contingency in a football game. 

SoltUion, Associate xi and xj with the admissible causes, baseball and foot¬ 
ball, respectively. Associate yi with the accident. From condition (a) of the 
problem, the existence probability for baseball is g(xi) - J. Also P(yi | xO 
= and P(yi | X 2 ) = tJtj- From (19), then, the a posteriori probability for 
baseball is 


P(xi 1 t/i) 


1 2 

2 '100 ^2 ^ 

1 , 1 7 9 ' 

2 * 100 2 ’ 100 


It follows that the a posteriori probability in favor of football is 1. 

Example 4. An urn contains five balls, black, white, or both kinds. Of three 
balls drawn together and at random (each ball within the urn is equally likely to 
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be drawn), two are black and one is white. What is the probability that the 
um contains three black and two white balls? 

Solution, Associate Xi, 0:2 • • •, a:«, with the possible compositions of the urn 
before the drawing was made, namely, OB, 5W; IB, 4W; • • • ; 5B, OW, Associate 
Vh Vij with the possible compositions in the drawing of three balls,'namely, 
OB, 3W; IB, 2W; 2B, IW; 3B, OW. The composition corresponding to y* 
was obtained and we seek the probability that it came from an urn with composi¬ 
tion specified by xa. That is, we seek P{x 4 1 yz). Clearly, 


Piyz I Xi) = 


C(3, 2)C(2. 1) ^3 
C(5, 3) 5 ’ 


so from (19) we have 
( 20 ) 


“ f C(i,l)C(5-i,2) ’ 

—c(h:^^— 


it being understood, of course, that C(n, r) = 0 when n < r. 

Since the values of g{xi) are unknown the problem does not have 
a unique solution. Moreover, if they were known we would be 
back in the domain of deductive probabilities again since all the 
probabilities in the right-hand member of (20) would then be known 
a 'priori. It is only when g{x^ are unknown that we are properly 
in the domain of a posteriori probability. In practical problems 
the g{x^ are scarcely ever known. 

Bayes realized this and argued that the x^s may be considered 
equally probable unless we have some reason to think they are not. 
Under this doctrine of insufficient reason,the x^q are assumed to 
have equal existence probabilities. In this case, g(xi) = constant 
and would cancel in (19), thus permitting a definite solution in (20). 
It appears that Bayes had serious doubts about this “ doctrine for 
he withheld his entire treatise from publication until his doubts 
should be resolved, and it was only after his death that his paper 
was published by friends. Laplace, however, was less cautious, and 
he incorporated the doubtful theorem into his Theorie Analytique des 
Probabilitis. Robed in the authority of Laplace it went unques¬ 
tioned for a long time. Boole was the first, in 1854, to criticize the 
assumption of “ the equal distribution of our knowledge, or rather 
of our ignorance ” and the assigning to different states of things of 
which we know nothing, equal degrees of probability.’^ Today, it 
is well known that the assumption of constant existence probabilities 
may lead to mathematical contradictions. This may clearly be seen 
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in the analogue to (19) for continuous variables. The following 
illustration of such a contradiction is cited by Wilks (Zoc. ciU), 

Let ^ be a parameter characterizing the universe and t a statistic 
from the sample. Then the analogue to (19) for th^ continuous 
case is 


( 21 ) 


F{e I 0 dd = 


g{d)f{t I e) dd dt 
dtfg{e)f{t \e)de‘ 


Now, if according to the doctrine of insufficient reason ” we may 
assume g{6) to be constant, (21) reduces to 


( 22 ) 


F(e \t) de 


fit I e) de 
ffit \e)dd^ 


But by the very nature of this “ doctrine ” there is no more reason 
to assume the a priori probability function of 6 to be constant than 
there is to assume the a priori probability distribution of some 
function of say 0S_to be constant^ The a priori distribution 
of = z is giVz)l2y/z, If flf(Vz)/2V'z is constant, then 


Fie \t)de ^ 


Of it I e) de 
fefit 1 e) de 


which is certainly inconsistent with (22). 

In arguing from a sample to the universe, any inference must be 
attended with some degree of uncertainty. But uncertainty should 
not be confused with lack of rigor. As we shall see, statements can 
be made about population parameters, subject to risks of being 
wrong, where the error is precisely expressed in terms of probability 
theory. In other words, the nature and degree of the uncertainty 
can be rigorously expressed. This can be accomplished without any 
assumptions regarding the a priori existence probabilities. 

7. Probable Error. The following concise exposition of the various 
usages of the term probable error ” is due to Professor A. T. Craig. 

There are in the literature three conceptions of the probable error. 
If, purely for convenience of language, we refer to the probable error 
of the mean, these conceptions can be stated as follows: (i) The 
probable error of the mean is that deviation, extended on both sides 
of the mean of the population, such that ^ is the probability that the 
mean of a sample will faU in this interval; (ii) The probable error of 
a mean is that deviation, extended on both sides of the mean of a 
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sample^ such that i is the probability that the mean of the population 
lies in this interval; (hi) The probable error of the mean is that devia¬ 
tion, extended on both sides of the mean of a sample^ such that \ is 
the probability that the mean of another sample will fall in this inter¬ 
val. Conception (i) leads without difficulty to the usual formula 
.6745 (<7/ Vat) for the probable error of the mean. This formula is 
rigorously correct for samples of any size drawn from a normal popu¬ 
lation and is valid for large samples drawn from any population with 
finite variance. On the other hand, the formula cannot be estab¬ 
lished under conception (ii) without further assumptions. If, before 
the sample is drawn, it is assumed, in the absence of any knowledge 
concerning the distribution of possible values of the mean of the 
population, that the existence distribution is constant, then the 
formula admits mathematical proof. But this assumption is essen¬ 
tially the same assumption as that made in applying Bayes^ Theorem 
to problems of probability a posteriori. 

The modern method of expressing the reliability of a statistical 
estimate of a population parameter in terms of fiducial limits seems 
likely to replace the traditional but often misleading mode of expres¬ 
sion involving probable error. The rest of the chapter is devoted to 
this recent advance in statistical inference. 

8. Fiducial Theory. The material of this section is reproduced 
from a recent paper on this subject by Rietz.® 

In explaining the meaning of the probable error of a statistic, one 
of the usual types of definition is essentially the following: The 
probable error of a statistic, t, is a positive number, Et, such that the 
chances are even that the population parameter of which t is an estimate 
from the sample, will fall within the interval t — Ettot + Et, 

This definition contains an inference about the values of a popula¬ 
tion parameter on the basis of information obtained from a random 
sample drawn from the population. 

Formulas for Et, in terms of observed data, when t may represent 
any one of a considerable number of statistics, say an arithmetic 
mean or a correlation coefficient, are usually listed for convenient 
application in numerous textbooks for teaching courses in sta¬ 
tistics. 

Under the definition stated above, it is noteworthy that these for¬ 
mulas depend on a fundamental assumption whose validity has long 
been in doubt. The assumption in question is to the effect that 
initially, that is, before our drawings of a sample are made, in our 
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lack of knowledge about the distribution of possible values of an 
unknown parameter, say of 6 , we may assume the existence distribu¬ 
tion of 6 to be constant. 

The invalidity of this assumption in many applied problems of 
statistical interest may be seen clearly in cases of a continuous distri¬ 
bution function with a derivative. Suppose that our initial assump¬ 
tions relating to a parameter 6 were such that 6 would initially be 
distributed in accord with a continuous frequency function, g( 6 )j 
which has a derivative at each point within its possible range on 6 , 
say from ^ = a to ^ Next, suppose g( 6 ) were restricted to be 

constant throughout the range of 6 . Then it is well known that the 
distribution of a simple non-linear function of 6 would not be con¬ 
stant. For example, the distribution of z = 0 ^ {n 7 ^ Ij 6 real and 
non-negative) would not be constant, but would be distributed in 
accord with a frequency function But if ^ is a popula¬ 

tion parameter, it seems fairly obvious that the logical character 
of our theory should usually, if not always, be such as to enable us to 
use a power of ^ as a parameter if we found it convenient to do so. 

The preceding introduction is designed to lead up to the important 
fact that, although in the usual statistical inquiry by sample, the 
true value of the population parameter 6 is unknown and remains 
unknown, there are cases in which precise statements can be made in 
terms of probabilities about the bounds within which a parameter d 
lies without making an assumption about the initial distribution of the 
possible values of 6 , It has been only about seven years since R. A. 
Fisher initiated some important ideas in this connection to which 
interesting contributions have been made by several mathematical 
statisticians.®"^ 

For simplicity, consider a case of a single parameter, 0, in which 
we know the frequency function of the statistic, t, to be given by an 
integrable function 

(23) 

where the values of t obtained from observation may be assumed to 
be good estimates of 0. Suppose we know (23) in such form that it is 
possible to calculate a table of values of the probabilities that the 
statistic, t, will fall into an assigned interval selected on a possible 
range (a, b) for any assigned value of 0 within the possible range 
(a, P) of 0. 

Next, for illustration, select a positive number e, say € = .005, 
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on which to base a certain level of confidence about values of 6 to be 
expressed in terms of probabilities. 

As our main problem may be clarified by a geometrical represen¬ 
tation, conceive of corresponding values of t and 6 obtained in an 
extensive statistical experiment as represented by rectangular coordi¬ 
nates within the rectangle bounded by lines t = a^t = h, 6 = a, 6 — 
(Fig. 23.) 

Consider an arbitrary assignment for 0, say that 0 is the true 
value of d. This gives the line AB (Fig. 23). Since the distribution 
of the statistic t is assumed to ^ 
be known for each assigned ^ 
value of 0, we may locate on 
the line AB two points, ti and 
<2 (ti ^ < 2 ) such that € is equal 
to the probability that t ob¬ 
tained from a random sample 
will yield a value of t less than 
or equal to ti, and similarly e ^ 
is the probability that such a ^ 
sample will yield a value greater ^ 
than or equal to < 2 . Then we 
have an interval on AB from 4 
to <2 such that 1 — 2€ is the probability that the random sample will 
yield a value within this interval. 

More formally stated, we may introduce a function F{t, B) defined 
as the definite integral of /(4 B) in (23) from t = a to t. That is, 

F(t, 9) 0) dt, 

for any arbitrarily assigned real value of B on its range from a to p. 
Then 

F(a, 6) = 0, F(6, B) = 1, F(4, S') = e, F(4, ^0 = 1 - e, 

(0 < € < 1). 

By considering all possible assignments of By in its possible range 
(a, P)y the locus of our set of lower values of 4 illustrated by t on the 
line ABy will give a continuous curve which we mark with in 
Figure 23, the subscript e being used to remind us that e is the proba¬ 
bility that a random value of ^ for ^ = B^ will fall below or at 4. 
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Similarly, our set of upper values of illustrated by <2 on ABy give 
a curve which we mark with 

If i is a good estimate of By its value usually, if not always, increases 
with B for all possible values. Thus, we shall restrict our further 
considerations to cases in which we may assume that t increases as B 
increases and vice versa. More precisely we are concerned with 
one-valued monotone increasing functions represented by the two 
curves marked C, and (7i_«. The region bounded by these two curves 
and the lines B — a and 0 has been called by Neyman the con¬ 
fidence belt with confidence coejficient equal to 1 — 2€. 

Next, consider the set of points, (<, B)y that would be obtained in 
Figure 23 in carrying out an extensive statistical experiment for 
which we seek a degree of accuracy in the long run, indicated by the 
value we assign to e. Then it is fairly obvious that the confidence 
belt is so constructed that 1 — 26 is the expected relative fre¬ 
quency with which points, (<, B)y will lie inside the confidence belt, 
and 2€ is the expected relative frequency with which such points 
will lie outside the confidence^ belt or on its boundary, whatever 
the nature of the initial distribution function of the parameter B 
may be. 

Conceive of drawing a large number of sets of random samples 
of N items each from a population consisting either of an infinite 
supply or of a finite supply with replacements, and that one of these 
samples, taken at random, yields a value of t = to for a certain 
statisticy then the line t = to parallel to the ^-axis would fail to inter¬ 
sect the boundaries of the confidence belt, in two points, in at most 
a small fractional part (less than 26) of the total number of sets of 
drawings. Denote the ordinates of the points in which the line 
t = to cuts the curves and C, by Bi and ^ 2 , respectively (Figure 23). 
These boundary values of B are called fiducial limits of B that cor¬ 
respond tot ^ to and the interval Bi to B 2 is called the fiducial interval 
foTt^to. It is important to emphasize that the statement that 
1 — 26 is the probability that a value of B taken at random will fall 
into the confidence belt is to be associated with the whole belt, 
that is, with results of repeated application of a sampling procedure 
to all values of t met with in an extensive statistical experiment, and 
not merely mth an assigned t The probability that {By t) falls 
within the confidence belt may differ for different assignments of ty 
but in the long run of statistical experience, the expected relative 
frequency of points within the confidence belt is 1 — 26. By choos- 
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ing € to be small, the probability is nearly 1 that the parameter lies 
within the confidence belt. 

The theory of confidence belts and fiducial intervals finds its 
main application in the testing of a certain hypothesis for possible 
rejection under the assumption that it is true. Such an hypothesis 
has been termed a null hypothesis. If, for a given c, the null hypoth¬ 
esis is rejected due to the value of t found from the actual data, the 
value of i is said to be significant at the level of probability equal 
to 2€. On the other hand, a value of t from observed data which 
does not reject the null hypothesis is said to be non-significant. 

9. Fiducial Limits, (a) For the mean. Let x and s be the mean 
and standard deviation of a sample of AT = n + 1 items drawn from 
a normal universe with unknown mean x. The problem is to deter¬ 
mine an interval surrounding x in which we may assume, with a 
certain degree of confidence, that x is contained. We learned in 
Chapter VII that the variable 


(24 


t = 


Vn(x — x) 


is distributed in accord with the Fn(t) curve and that P = 1 -- P„(0 
has been tabulated for various values of t and n, where 

Pn(0 = 2 rVn(0 dt, 

Jo 

Therefore, for an assigned e and for an assigned value of n, (n < 30), 
we may obtain from the tables upper and lower critical values of t 
by solving the equation P — 2e, With these critical values we can 
determine from (24) the required interval surrounding x for the 
given value of e. It is conventional among certain workers to take 
e = .005 (or .025) since they wish to determine values of the estimates 
of X in an interval dividing hypotheses that will be rejected from 
those acceptable under a null hypothesis at the 1% (or 5%) level of 
significance. 

Suppose, then, that we make the claim 

(26) - t, -4= < * < X + 

Vn Vn 

and we desire the probability of an error in this statement to be not 
more than 2€ = .01. Taking n = 15, for example, we find from 
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Table 12, Chapter VII, that t = =fc2.947 when P = .01. 
have 


(z — x) 


±2M7s 

Vih 


and the claim 


= ±.76s 

X — .76s < X < X + .76s 


Then we 


will be correct 99% of the time. 

It is clear from the above procedure that our confidence in the 
fiducial limits x ± tts/\/n is measured by the area under the Fn{t) 
curve inside t = that is, by PniQ- This means that if we 
could observe all possible samples, the proportion represented by 
Pn{U) would yield values of x and s for which the claim (25) is true, 
while the remaining proportion, P = 1 — Pn{Qi would yield values 
of X and s for which the claim is false. 



Fig. 24 

If we were testing a hypothetical value of x we would say that 
5 is not significant at the 1% level of significance if x has any value 
in the x db Usly/n interval, e = .005. If x does not lie in this 
interval we say that x is significant at this level. 

Obviously, values of t satisfying the equation P = .01, that is, 
Pn(t) = .99, vary with n. To avoid the trouble of entering a table 
we give an alternate method which is valid when the sample is not 
small. Recall that the variable 

, (S - $)Vn - 3 


8 
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is approximately normally distributed when N > 30. The area 
under the normal curve outside t = ±2.576 is . 01 . Therefore, the 
99% fiducial range of x is then 


X ± 


2.576s 
Vn - 3 


and the range gets smaller as N increases. 

(b) For the difference between two means. Let Xi and Si® be the 
observed mean and variance of a sample of Ni drawn from a normal 
universe with unknown mean Xi and let X 2 and S 2 ^ be the observed 
mean and variance of a sample of N 2 drawn from a normal universe 
with unknown mean X 2 . It is assumed that the two universes have 
a common variance For brevity, let 



<r{s 


is distributed in accord with Fn{t) for n = N — 2. From (26), 
upper and lower fiducial values of w can be found by assigning to t 
the solutions of Pn{t) = .99, that is, of P = .01. If the value 
w = 0 falls outside the fiducial interval thus established, the con¬ 
clusion is that the difference between the means is significant at the 
1% level. That is, tt; 5 *^ 0 and hence Xi 5 ^ X 2 - 
If the two samples are equal in number so that the variates can be 
paired in some manner we may compute (26) by a different method. 

N 

Let N = Ni N 2 , ly = Xi — X 2 t and compute w and ^(w — w)\ 
Then 




Vn -1 
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w — <b 



The last expression is sometimes called Bessel’s Formula. 

Example 5. (Snedecor“) Imagine a newly discovered apple, attractive in 
appearance, delicious in flavor, having apparently all the qualifications of suc¬ 
cess. It has been christened King.” Only its yielding capacities in various 
localities is yet to be tested. The following procedure is decided upon. King is 
planted adjacent to Standard in 15 orchards scattered about the region suitable 
for production. Years later, when the trees have matured, the yields are meas¬ 
ured and recorded in the following table where Xi refers to King, xi to Standard, 
and = a!i — Xi. The yields are in bushels. 


Xi 

X2 

w 

{w — w)^ 

13 

11 

2 

16 

12 

6 

6 

0 

10 

3 

7 

1 

6 

1 

5 

1 

13 

7 

6 

0 

15 

10 

5 

1 

19 

9 

10 

16 

10 

4 

6 

0 

11 

3 

8 

4 

11 

6 

5 

1 

13 

8 

5 

1 

9 

5 

4 

4 

14 

7 

7 

1 

12 

6 

6 

0 

12 

4 

8 

4 

Totals 

90 

50 


Substituting in (27) we get 


t 


6 — tg 

50 

(15)(14)J 


6 - w 
.488 ’ 


Interpolating in Table III for n » 14 and checking the result in the more exten¬ 
sive table in Fisher’s text we find that P - .01 when t * 2.977. Then solving 
the equation 

6 


.488 


±2.977 
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we obtain w * 4.55 and w = 7.45. Since w — 0 is outside the interval from 
4.55 to 7.45, the observed value of w differs significantly from either value of the 
parameter. In other words, for these as well as for all values outside the fiducial 
interval 4.55-7.45, we would reject (at the 1% level of significance) the null 
hypothesis that there is no significant difference between the yields of the two 
varieties, insofar as their means provide a criterion of judgment. 


(c) For the variance. In (25) of Chapter VII we obtained the dis¬ 
tribution of which we will now write in the form 


H{s^) ds^ 



ds\ 


If we let = Ns^/a^ we 
N replacing fc, 

T(x^) dx^ 


get the distribution given in (12) with 

g-*2/2(^2)(V-3)/2 


2(V-l)/2p 




dx^ 


That we should thus obtain (12) is more than a coincidence, because 
it turns out that Ns^/<t^ actually is x^ for N observations made on a 
single magnitude. If now we let n = AT ~ 1 we obtain the distribu¬ 
tion for n degrees of freedom, 


(28) 


Tnix^) dx^ 


^~a?l2^^2yn-2)l2 



dx\ 


To determine the fiducial limits of we first observe from (3) of 

N 

Chapter VII that therefore we may 

1 

write x^ = If now we make the claim 


no-^ 


< < 


X2^ 




where xi^ and X 2 ^ are arbitrarily chosen constants (xi* < X 2 ^), then 
our measure of confidence in the correctness of this claim is given 
by /«(xi*) - where 

/-(x*) = r”r„(x*) dx*. 

Values of J„(x*) can be obtfuned from Pearson's Tables.^ 
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O X* 

Fig. 25 

For further study of fiducial inference and its applications to testing 
hypotheses, the reader is referred to the publications of Fisher,® 
Ne 3 nnan,“ and Wilks.“ 
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Exercises 


1. Read the following paper: The x^^Test of Significance, T. C. Fry, Journal of 

the American Statistical Association, vol. 33, pp. 513-625. (The three 
papers following Fry’s exposition are also recommended.) 

2. Toss seven coins 128 times and record the frequencies of heads. Apply the 

x*-test to the resulting distribution. 

3. Graduate an appropriate distribution in Part I by means of the normal 

curve and test the composite hypothesis that the observed distribution 
was a sample from a normal universe having the mean and standard 
deviation of the sample. 

4. Give a report on x* and contingency tables. 

5. (Chrystal) A bag contains three balls, each of which is either white or 

black, all possible numbers of white being equally likely. Two at once 
are drawn at random and prove to be white. What is the probability 
that all of the balls are white? Ans, }. 

6. If, in Example 4, it is assumed that, initially, ail possible numbers of white 

balls in the urn are equally likely, what is the solution? 

7. If N is large s how tha t the 95% fiducial range of it for a normal universe is 

S db 1.96/\^N - 3. 

8 . Making use of the references cited prepare a report on fiducial inference. 
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Review Problem 


A question arose in a physical education class as to whether eleven>year-old 
girls weigh, as a rule, more than eleven-year-old boys. Suppose you wished to 
make a thorough analysis of the data in the table below concerning weights of 
boys and girls aged eleven. Describe the tests you might apply, the reasoning 
and assumptions underl 3 dng these, and the interpretation that might be placed 
on the results. 


Weight (pounds) 

Class Marks 

Frequency 

Boys 

Girls 

42.5 

1 

0 

48.5 

3 

1 

54.5 

9 

7 

60.5 

33 

37 

66.5 

65 

41 

72.5 

80 

59 

78.5 

72 

58 

84.5 

41 

48 

90.5 

27 

23 

96.5 

7 

26 

102.5 

4 

16 

108.5 

2 

5 

114.5 

1 

3 

120.5 

0 

2 

Totals 

345 

326 


The following points are suggested for discussion: 

(a) Is there a clear difference between the two distributions? How would 
you test this: from the means, from the variances, from the samples as a whole? 

(b) 32.3% of the boys and 26.4% of the girls have weights less than 69.5 
pounds. Is this difference significant? 

(c) Within what limits would you say that the mean and standard deviation 
in the population of eleven-year-old boys (from which you have the sample of 
345) is almost certain to lie in each case? 

(d) Summarize your results. 
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APPENDIX 


Tables 

I. Ordinates and Areas of the Norbial Curve. 

II. 5% AND 1% Points for the Distribution of P. 

III. X* Probability Scale. 




Table I. Ordinates and Areas op the Normal Curve, M ) == —jL=-e“<*/2 

v2t 


t 

4>{t) 


t 



n 



.00 

.39894 

■HI 

.45 

.36053 

.17364 

.90 

.26609 

.31594 

.01 

.39892 

IfiTiiMnil 

.46 

.35889 

.17724 

.91 

.26369 

.31859 

.02 

.39886 

■IlIlVifKl 

.47 

.35723 

.18082 

.92 

.26129 

.32121 

.03 

.39876 

.01197 

.48 

.35553 

.18439 

.93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

.18793 

.94 

.25647 

.32639 

.05 

.39844 

.01994 

.50 

.35207 

.19146 

.95 

.25406 
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Table I. Ordinates and Areas op the'Normal Curve, 0(0 




t 

0(0 


t 



t 

«(t) 


1.35 

.16038 

.41149 

1.80 


.46407 

2.25 

.03174 

.48778 

1.36 

.15822 

.41309 

1.81 

WUrllSlm 

.46485 

2.26 

.03103 

.48809 

1.37 

.15608 

.41466 

1.82 

.07614 

.46562 

2.27 

.03034 

.48840 

1.38 

.15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

.48870 

1.39 

.15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

.14973 

.41924 

1.85 

.07206 

.46784 


.02833 

.48928 

1.41 

.14764 

.42073 

1.86 

.07074 

.46856 

2.31 

.02768 

.48956 

1.42 

.14556 

.42220 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

.14350 

.42364 

1.88 

.06814 

.46995 

2.33 

.02643 

.49010 

1.44 

.14146 

.42507 

1.89 

.06687 


2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

.02522 

.49061 

1.46 

.13742 

.42786 

1.91 

.06439 

Kansi 

2.36 

.02463 

.49086 

1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

.02406 

.49111 

1.48 

.13344 

.43056 

1.93 

.06195 


2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94 


.47381 


.02294 

.49158 

1.50 

.12952 

.43319 

1.95 

.05959 

.47441 


.02239 

.49180 

1.51 

.12758 

.43448 

1.96 

.05844 

mmm 

2.41 

.02186 

.49202 

1.52 

.12566 

.43574 

1.97 


.47558 

2.42 

.02134 

.49224 

1.53 

.12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 

.12188 

.43822 

1.99 



2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2.00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 

.11816 

.44062 

2.01 

.02592 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1,58 

.11450 

,44295 

2.03 


.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

,44408 

2.04 


.47932 

2.49 

.01797 

.49361 

1.60 

,11092 

.44520 

2.05 

.04879 

.47982 

IHBtl 

.01753 

.49379 

1.61 

.10915 

.44630 


mimm 


2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

.04682 


2.52 

.01667 

.49413 

1.63 

,10567 

.44845 

2.08 

.04586 

.48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

,04491 

.48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 


.04398 

.48214 

2..^ 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

.01468 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

.48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2.14 

.04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 


.48422 

IHBil 

.01358 

.49534 

1.71 

.09246 


2,16 


.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 


.48537 

2.63 

.01256 

.49573 

1.74 

.08780 


2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75V 

^.08628 

.45994 


.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2.21 

.03470 

.48645 

2.66 

,01160 

.49609 

1.77 

.08329 

.46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

l.f9 

.08038 

.46327 

2.24 

.03246 

.48745 

2.69 

.01071 

.49643 


192 












































Tablb I. Ordinates and Areas of the Normal Curve. 6 ( t ) » - JL , e ~<'/2 
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Table II.* 5 % (Roman Type) and 1% (Bold Face Type) Points for the Disymbution op P 
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Table II. 5% (Roman Type) and 1% (Bold Face Type) Points fob the Disteibution op F 
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Tablb II. 5% (Roman Type) and 1 % (Bold Face Type) Points for the Distribution of F 
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Table III. Table of x’ Probability Scale (from R. A. Fishbr^s Table) 


Chance of Exceeding Given Value of x* 


.30 

.20 

.10 

.05 

.02 



For larger values of V?? — V2n — 1 may be referred approximately to 
normal probability scdle. 
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Deming and Birge, 137,139 
De Moivre-Laplace theorem, 2ln 
Dichotomy, 9, 164 
Difference, testing significance of, 
between correlation coef¬ 
ficients, 155 
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means, 140,151 
proportions, 119 
sample variances, 145 
Distributions, generally, 43 
Bernoulli, 10 
binomial, 9 
Cauchy, 45 
Fisher’s U, 137 
Fisher’s 2 -, 142 
Gram-Charlier, 59 
joint, 63 
marginal, 64 
normal, 49, 55 
normal bivariate, 70 
of means, 133 
of standard deviations, 135 
of variances, 134 
Pearson Type III, 49 
Poisson exponential, 29 
“ Student’s,” 128 
Estimates, unbiased, 125 
Expected value, 
of mean, 102 

of standard deviation, 127,135 
of variance, 123, 135 
propositions concerning, 100 
Fiducial inference, theory of, 177 
Fiducial limits, 181 
Fisher’s derivation of “Student’s” 
distribution, 130 
Fisher’s ^-distribution, 137 
table of, 138 

Usher’s 2 -distribution, 142 
Frequency curves, generally, 43 
Gram-Charlier, 59 
Pearson system of, 46 
Frequency surface, the (J, s)-, 
136 

normal, 70 

iSy, 5, 20n, 32, 39,186 
Function, distribution, 43 
Beta, 39 
Gamma, 35 

incomplete Beta and Gamma, 41 


Gamma function, 35 
Geary, 134, 160 
Gram-Charlier series, 59 
Hermite polynomials, 59 
Homoscedastic arrays, 73, 92 
Hotelling, 2 

Hjqjergeometric series, 54 
Independence, definition of sta¬ 
tistical, 64 
Interaction, 149 
Irwin, 161 (ref. 22) 

Jackson, 95, 109 
Jacobian, 37 
Joint distributions, 63 
Large numbers, law of, 114 
Laplace, 175 
Levy and Roth, 5, 60n 
Limits, fiducial, 181 
Linear function, standard error 
of, 101 

Marginal distributions, 64 
Mathematical expectation, see 
expected value 
Means, 

test of significance of differ¬ 
ence between, 140 
testing variation in, 151 
Median, 61 
Mills, 162 
Molina, 174 

Moments, generally, 44, 64 
of Bernoulli dbtribution, 14 
of distribution of means from 
arbitrary universe, 102 
Multinomial law, 164 
Multiple correlation, 78 
coefficient, 87 
regression, 82 
Honoal curve, 55 
connection with Gram-Charlier 
series, 59 

with Pearson system, 49 
normalized, 49 
quadrature of, 57 
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reproductive property of, 109 
Normal equations, 83 
Neyman, 160 

Null hypothesis, 116, 141, 181 
Parameters, unbiased estimates 
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Partial correlation, 91 
coefficient, 93 
Pearson, Karl, 41, 168 
Pearson, E. S., 116,121, 127,160 
Permutations and combinations, 
4 

Probability, 2, 51, 54 
a priori, 2, 174 
a posteriori, 2, 174 
distributions, 16, 43, 63 
inverse, 174 

scale of sampling fluctuations, 
115 

Probable error, 26,176 
Proportions, testing significance 
of difference between, 119 
Rider, 136n, 161 

Rietz, 3n, 5, 10, 18, 20n, 28n, 97, 
111, 141, 177 
Regression, 
curves, ^ 
lines, 67 
multiple, 82 

systems in normal surface, 71 
testing linear, 153 
Repeated trials, 7 
Reproductive property of normal 
law, 109 

Romanovsky, 135,136, 161 
Sample, 97 

fflze of, to have a given relia¬ 
bility, 118 
Salvosa, 51 
Shewhart, 103,113 
Significance, rule for level of, 117, 
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Small samples, generally, 123,129 
Sned^or, 145, 153, 184 
Soper, 168 

Standard error of estimate, 85 
Standard deviation of estimated 
values, 87 
Statistic, 98, 125 
Statistical independence, defini¬ 
tion of, 64 

Stirling’s approximation, 38, 165 
Stochastic convergence, 115 
Struik, 5 

“ Student’s ” distribution, 128 
f-distribution, 137 
Tchebycheff’s inequality, 113 
Testing hypotheses about fre¬ 
quency distributions, 168 
linearity of regression, 153 
Testing significance of 
correlation coefficients, 155 
difference between means, 140, 
151 

di fference between proportions, 
119 

mean, when universe variance 
is known, 116 

mean, when universe variance 
is unknown, 128 
ratio between variances, 145 
Tetrachoric correlation, 74 
Thurstone, 76 
Total correlation, 80 
'Transformation of correlation co¬ 
efficient, 159 

Type A distribution, Gram-Char- 
lier, 59 

Type III distribution, Pearson, 
49 

Unbiased estimates of population 
parameter, 125 
Universe, 
finite, 112 
non-normal. 111 
Uspensky, 5 
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Wilks, 186 
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